Nonlinear Hawkes Processes 



by 



en 

(S) Lingjiong Zhu 



5? 
CM 

A dissertation submitted in partial fulfillment 

g 
i_i of the requirements for the degree of 

Vj Doctor of Philosophy 

£<■} Department of Mathematics 

in 

r*** New York University 

O May 2013 

m 



x 



Professor S. R. S. Varadhan 



© Lingjiong Zhu 
All Rights Reserved, 2013 



To the memory of my grandpa 
Zhixuan Zhu (1923-2001) 



in 



Acknowledgements 

It is difficult to overstate my gratitude to my adviser Professor Varadhan. 
Working with Professor Varadhan has been an absolutely amazing experience for 
me. I thank him for always keeping his door open and patiently answering my 
questions. I thank him for his superb guidance, understanding, and generosity. 
I thank him for suggesting the topic for my thesis, which would not be possible 
without his deep wisdom and sharing of many new ideas. He has been everything 
that one can reasonably ask for in an advisor and more, and I am truly grateful 
to him. 

I want to thank the Courant community for guiding me through this process 
and for putting up with me in general. Tamar Arnon does her job exceptionally 
well and her efforts are much appreciated. I want to thank the faculty for many 
well taught and interesting classes. I am indebted to Gerard Ben Arous, Sourav 
Chatterjee and Raghu Varadhan for writing me recommendations for my first 
academic job. I also want to thank Peter Carr for his interest in my thesis. 

I remember a joke told by Jalal Shatah that the most important thing as an 
undergraduate student is to go to a top graduate program. But once you are 
already at a graduate school, the most important thing is to get out of it! This 
would not be possible without the final step, i.e. thesis defense! I am grateful to 
have Henry McKean, Chuck Newman and Raghu Varadhan as the three readers 
and Gerard Ben Arous and Lai-Sang Young as the two non-readers on my thesis 
committee. 

Most importantly, I want to thank my fellow colleagues for all the fun memories 
that I take with me from Courant. New York City, without good friends, can be 
the most populated lonely place in the world, but thankfully the constant friend- 

iv 



ship of my fellow Courant colleagues has made these five years some of the most 
entertaining and pleasurable of my life. I thank Antoine Cerfon, Shirshendu Chat- 
terjee, Oliver Conway, Sinziana Datcu, Partha Dey, Thomas Fai, Max Fathi, Mert 
Giirbuzbalaban, Matan Harel, Miranda Holmes-Cerfon, Arjun Krishnan, Shoshana 
Leffler, Sandra May, Jim Portegies, Alex Rozinov, Patrick Stewart, Adam Stinch- 
combe, Jordan Thomas, Chen-Hung Wu and many others for their friendship. In 
particular, I want to thank Dmytro Karabash, Behzad Mehrdad and Sanchayan 
Sen. They are not only my good friends, but coauthors as well. I also thank my 
office neighbor Cheryl Sylivant for her friendship. 

By living in New York City, I had the great opportunities to visit as many 
museums and go to as many concerts as possible. I am grateful to the New York 
Philharmonic and Metropolitan Opera House for their student ticket offers and 
also many wonderful student recitals and concerts at Juilliard School, which have 
made my stay in New York City much more enjoyable. 

I also want to thank the professors at the University of Cambridge, who pro- 
vided me a solid undergraduate education. In particular, I am grateful to Rachel 
Camina, as well as Houshang Ardavan and Tom Korner. I also want to thank Ste- 
fano Luzzatto for supervising me on an undergraduate research project at Imperial 
College, London. 

I am very much indebted to my family back home. I thank my parents for so 
many years of love and understanding. They are truly the best parents one could 
ask for. I also thank my grandmas, uncles and aunts for their support. Finally, I 
dedicate this thesis to the memory of my late grandpa. I miss him dearly. 



Abstract 

The Hawkes process is a simple point process that has long memory, clustering 
effect, self-exciting property and is in general non-Markovian. The future evolution 
of a self-exciting point process is influenced by the timing of the past events. There 
are applications in finance, neuroscience, genome analysis, seismology, sociology, 
criminology and many other fields. We first survey the known results about the 
theory and applications of both linear and nonlinear Hawkes processes. Then, we 
obtain the central limit theorem and process-level, i.e. level-3 large deviations for 
nonlinear Hawkes processes. The level-1 large deviation principle holds as a result 
of the contraction principle. We also provide an alternative variational formula for 
the rate function of the level-1 large deviations in the Markovian case. Next, we 
drop the usual assumptions on the nonlinear Hawkes process and categorize it into 
different regimes, i.e. sublinear, sub-critical, critical, super-critical and explosive 
regimes. We show the different time asymptotics in different regimes and obtain 
other properties as well. Finally, we study the limit theorems of linear Hawkes 
processes with random marks. 
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the process explodes near time t = 6 11301 



XI 



Introduction 

This thesis is about the nonlinear Hawkes process, a simple point processes, 
that has long memory, the clustering effect, the self-exciting property and is in 
general non-Markovian. The future evolution of a self-exciting point process is 
influenced by the timing of the past events. There are applications in finance, 
neuroscience, genome analysis, sociology, criminology, seismology, and many other 
fields. 

Chapter [T] includes the introduction of the model and the survey of the results 
already known in the literature about Hawkes processes. That includes the stability 
results, limit theorems, power spectra of linear Hawkes processes and stability 
results of nonlinear Hawkes processes. 

Chapter [2] is about the functional central limit theorem of nonlinear Hawkes 
processes. A Strassen's invariance holds under the same assumptions. The work 
in Chapter |2] is based on Zhu |113j . 

Chapter [3] is dedicated to the process-level large deviations, i.e. level-3 large 
deviations, of the nonlinear Hawkes processes. The proofs consist of the proofs of 
the lower bound, the upper bound and the superexponential estimates. The level-1 
large deviation principle is derived as a result of the contraction principle. This 
chapter is based on Zhu |112j . 

Chapter [4] is dedicated to the study of level-1 large deviation principle for 
nonlinear Hawkes processes when the exciting functions are exponential or sums 
of exponentials. It is based on the observation that when the exciting functions are 
exponential or sums of exponentials, the process is Markovian and a combination of 
Feynman-Kac formula for the upper bound of large deviations of Markov processes 
and tilting of the intensity function of Hawkes processes for the lower bound will 
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establish a level- 1 large deviation principle with the rate function expressed in 
terms of some variational formula. This chapter is based on Zhu [llij . 

Chapter [5] is about the asymptotics for nonlinear Hawkes processes. In this 
chapter, we drop the usual assumptions on nonlinear Hawkes processes, and study 
the phase transitions in different regimes. We categorize nonlinear Hawkes pro- 
cesses into the following regimes: sublinear regime, sub-critical regime, critical 
regime, super-critical regime and explosive regime. Different time asymptotics 
and various properties are obtained in different regimes. This chapter is based on 
Zhu [HE] . 

Chapter [6] is about the limit theorems for linear Hawkes processes with random 
marks. The Central limit theorem and the large deviation principle are derived. 
We end this chapter with a simple application to a risk model. This is based on 
the joint work with my colleague Dmytro Karabash, see |62j. 

During my time as a PhD student at Courant Institute, I have the joy to work 
on some other problems either by myself or with my colleagues. For example, I 
studied the large deviations of self-correcting point processes with Sanchayan Sen, 
see |1UU] and also did some work on biased random walks on Galton- Watson trees 
without leaves with Behzad Mehrdad and Sanchayan Sen, see [75J. But since they 
are not closely related to the topics of my thesis, I do not include them here. 



Chapter 1 



Hawkes Processes 



1.1 Introduction 

Hawkes process is a self-exciting simple point process first introduced by Hawkes 
[51]. The future evolution of a self-exciting point process is influenced by the timing 
of past events. The process is non-Markovian except for some very special cases. 
In other words, Hawkes process depends on the entire past history and has a long 
memory. Hawkes process has wide applications in neuroscience, see e.g. Johnson 
[59] , Chornoboy et al. [23], Pernice et al. [93], Pernice et al. [53] , Reynaud et al. 
[98] : seismology, see e.g. Hawkes and Adamopoulos [53], Ogata [87], Ogata [88] . 
Ogata et al. [90]; genome analysis, see e.g. Gusto and Schbath [30], Reynaud- 
Bouret and Schbath [%]; psycology, see e.g. Halpin and De Boeck [31]; spread of 
infectious disease, see e.g. Meyer et al. [76J; finance, see e.g. Bauwens and Hautsch 
[7] , Bowsher [T3] , Hewlett [50] , Large \§T\ , Cartea et al. [22] , Chavez-Demoulin et 
al. [23], Errais et al. [30]. Embrechts et al. [35], Muni Toke and Pomponio [83] . 
Bacry et al. [3], [3], [Tj; and in many other fields. 



Let N be a simple point process on R and J 7 ^ 00 := a(N(C),C € B(R),C C 
(—00, t]) be an increasing family of a-algebras. Any nonnegative .^""^-progressively 
measurable process At with 



1.1) E[N(a,b}\^]=E 



\sdslF~ 00 



a.s. for all intervals (a, b] is called the .F t -0O -intensity of N. We use the notation 
N t := N(Q,t] to denote the number of points in the interval (0, £]. 

A nonlinear Hawkes process is a simple point process N admitting an J 7 ^ 00 - 
intensity 

(1.2) \ t :=\(f h(t-s)N(ds] 

\J — 00 

where A(-) : R + — > R + is locally integrable and left continuous, h(-) : R + — ¥ R + . 
We always assume that ||/i||i,i = L h(t)dt < 00 unless otherwise specified. Here 
X_oo h(t — s)N(ds) stands for f ( _ oot - ) h(t — s)N(ds), which is important for J 7 ^ 00 - 
predictability. The local integrability assumption of A(-) is to avoid explosion 
and the left continuity assumption of A(-) is to ensure that the process is J 7 ^ 00 - 
predictable. 

In the literature, h(-) and A(-) are usually referred to as exciting function and 
rate function respectively. 

A Hawkes process is said to be linear if A(-) is linear and it is nonlinear other- 
wise. For a linear Hawkes process, we can assume that the intensity is 



;i.3) A t :=i/+ / h(t-s)N(ds) 

'(-oo,t) 



In this thesis, unless otherwise specified, we assume the following. 

• A(-) : R + —?• 1R + is continuous and non-decreasing. 

• h(-) : R + — y IR + is continuous and non-increasing. 

• N(— oo, 0] = 0, i.e. the Hawkes process has empty past history. 

Throughout, we define Z t as Z t := J h(t — s)N(ds). Thus, X t = X(Z t ). 

The first assumption says that the occurence of the past and present events have 
positive impact on the occurence of the future events. The second assumption says 
that as time evolves, the impact of the past events is decreasing. For most of the 
results in this paper, these two assumptions may not be necessary. We nevertheless 
make them to avoid some technical difficulties. 



If one looks at (1.2), it is clear that if you witness some events occuring, X t 
increases since A(-) is increasing and you would expect even more events occuring. 
This is called the self-exciting property. Because of this, you would expect to see 
some clustering effects. 



Figure |1.1| shows the histograms of a Hawkes process and a usual Poisson pro- 
cess. A Poisson process is stationary with independent increments. On the con- 
trary, the Hawkes process has dependent increments and has clustering effects. As 
a result, in the picture, the Poisson process is more or less flat whilst the Hawkes 
process has peaks when it gets "excited" and has valleys when it "cools down". 



Figure |1.2| shows the plot of the intensity X t of a Hawkes process. Unlike the 
usual Poisson process for which the intensity is a positive constant, the intensity of 
Hawkes process increases when you witness arrivals of points and it decays when 
there are no arrivals of points. 



The self-exciting and clutstering properties of the Hawkes process make it ideal 
to characterize the correlations in some complex systems, including the default 
clustering effect in finance. 

One generalization of classical linear Hawkes process is the so-called multivari- 
ate Hawkes process. We will define the multivariate Hawkes process and discuss 
some basic results in Section 1.6 of Chapter [TJ The multivariate Hawkes process 



has been well studied in the literature and we would like to point out that if you 
have the result for the univariate Hawkes process, mathematically, it is not too 
difficult to generalize your result to multivariate Hawkes process. 

Unlike the univariate Hawkes process, which only has the self-exciting property, 
the multivariate Hawkes process also has the mutually-exciting property. In the 
context of industry, consider that you have a large portfolio of companies, then the 
failure of one company can have impact on the performance of other companies. 
In other words, multivariate Hawkes process captures the cross-sectional clustering 
effect. That is why in most applications of Hawkes processes in finance, people 
usually consider multivariate Hawkes processes. We will review some basic results 
about multivariate linear Hawkes process in Chapter [TJ 

Another possible generalization to Hawkes process is the marked Hawkes pro- 
cess, i.e. Hawkes process with random marks. Just like univariate Hawkes process 
vesus multivariate Hawkes process, if you have the results in unmarked Hawkes 
process, usually it can be generalized to marked Hawkes process without much 
difficulty. For instance, the large deviations for linear Hawkes process is proved 
in Bordenave and Torrisi [UJ and the large deviations for linear marked Hawkes 
process is then proved in Karabash and Zhu [62J. We will discuss the details of 
limit theorems of linear marked Hawkes process in Chapter |6l 



Most of the literature on Hawkes processes studies only the linear case, which 
has an immigration-birth representation (see Hawkes and Oakes [M])- The stabil- 
ity, law of large numbers, central limit theorem, large deviations, Bartlett spectrum 
etc. have all been studied and understood very well. Almost all of the applications 
of Hawkes processes in the literature consider exclusively the linear case. Daley 
and Vere- Jones [27] and Liniger [71] provide nice surveys about the theory and 
applications of Hawkes processes. 

One special case of the Hawkes process is when the exciting function h(-) is 
exponential. In this case, the Hawkes process is a continuous time Markov process. 
If A(-) is linear, the process is a special case of affme jump-diffusion process and 
is analytically tractable. This special case was for example studied in Oakes [SB] 
and Errais et al. [36] . 

Because of the lack of computational tractability and immigration-birth repre- 
sentation, nonlinear Hawkes process is much less studied. However, some efforts 
have already been made in this direction. For instance, see Bremaud and Mas- 
soulie [H] for stability results, and Bremaud et al. [T5] for the rate of convergence 
to stationarity. Karabash [U3] recently proved the stability results for a wider class 
of nonlinear Hawkes processes. 

As to the limit theorems, Bacry et al. [2] proved the central limit theorem for 
linear Hawkes process and Bordenave and Torrisi [11] proved the large deviation 
principle for linear Hawkes process. 

For nonlinear Hawkes process, there is no explicit expression for the variance 
in the central limit theorem or the rate function for the large deviation principle. 
The method is more abstract and much more involved. Zhu |113] proved a cen- 
tral limit theorem for ergodic nonlinear Hawkes processes. Zhu [111] studied the 
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Figure 1.1: This is a comparison of a Hawkes process with a Poisson process. The 
figure on the left shows the histogram of a Hawkes process with h(t) = ( t \\2 and 
X(z) = 1 + jjjZ and the figure on the right the histogram of a Poisson process with 
constant intensity A = |. In the figure, each column represents the number of 
points that arrived in that unit time subinterval. 

large deviations in the Markovian case, i.e. when h(-) is exponential or sum of 
exponentials. And Zhu |112j proved the large deviation principle for more general 
nonlinear Hawkes processes at the process-level, i.e. level-3. 




Figure 1.2: Plot of intensity Xt for a realization of Hawkes process. Here h(t) 



(t+i) 1 



and \(z) = 1 + 0.92. 



1.2 Applications of Hawkes Processes 

1.2.1 Applications in Finance 

The applications of Hawkes processes in finance include market orders mod- 
elling, see e.g. Bauwens and Hautsch [7], Bowsher [T3], Hewlett [56], Large [67] and 
Cartea et al. [22]; value-at-risk, see e.g. Chavez-Demoulin et al. [23]; and credit 
risk, see e.g. Errais et al. [36]. Embrechts et al. [35] applied Hawkes processes to 
model the financial data. Muni Toke and Pomponio [83] applied Hawkes processes 
to model the trade-through. Bacry et al. [3] used Hawkes processes to reproduce 
empirically microstructure noise and discussed the Epps effect and lead-lag. The 
self-exciting and clustering properties of Hawkes processes are especially appealing 
in financial applications. 

Currently, most of the applications of Hawkes process in the finance literature 
are about market orders modelling, see e.g. Bauwens and Hautsch [7J, Bowsher 
[13] and Large [67] . 

Recently, Chavez-Demoulin and McGill [24] used Hawkes processes to study 
the extremal returns in high-frequency trading. The Hawkes process captures 
the volatility clustering behavior of the intraday extremal returns, and provides 
a suitable estimation of high-quantile based risk measures (e.g. VaR, ES) for 
financial time series. 

Filimonov and Sornette [39] used Hawkes process to model market events, with 
the aim of quantifying precisely endogeneity and exogeneity in market activity. By 
using Hawkes process, Filimonov and Sornette [39] analyzed E-mini S&P futures 
contract over the period 1998-2010 and discovered that the degree of self-reflexivity 
has increased steadily in the last decade, an effect they attribute to the increased 



deployment of high-frequency and algorithmic trading. When they calibrated over 
much shorter time intervals (10 minutes), the Hawkes process analysis is found to 
detect precursors of the flash-crash that happened on May 6th, 2010. An early 
detection can benefit market regulators. 

Very recently, Hardiman et al. [50] used (linear) Hawkes process to model the 
arrival of mid-price changes in the E-Mini S&P futures contract. Using several 
estimation methods, they found that the exciting function h(-) has a power-law 
decay and ||/i||z,i is close to 1. They pointed out that markets are and have always 
been close to criticality, challenging the studies of Filimonov and Sornette [SH] 
which indicates that self-reflexivity (endogeneity) has increased in recent years as 
a result of increased automation of trading. 

Egami et al. [33] studied the credit default swap (CDS) markets in both Japan 
and U.S. They made a dynamic analysis of the bid-ask spreads in both countries, 
which surged dramatically during the 2008-2009 financial crisis and they used the 
Hawkes process to predict the bid-ask spreads. 

As pointed out in Errais et al. [36] , "The collapse of Lehman Brothers brought 
the financial system to the brink of a breakdown. The dramatic repercussions point 
to the exisence of feedback phenomena that are channeled through the complex 
web of informational and contractual relationships in the economy... This and 
related episodes motivate the design of models of correlated default timing that 
incorporate the feedback phenomena that plague credit markets." According to 
Peng and Kou [H2], "We need better models to incorporate the default clustering 
effect, i.e., one default event tends to trigger more default events both across 
time and cross-sectionally" The Hawkes process provides a model to characterize 
default events across time and if one uses a multivariate Hawkes process, that 
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would describe the cross-sectional clustering effect as well. 

Hawkes processes have been proposed as models for the arrival of company 
defaults in a bond portfolio, starting with the papers Giesecke and Tomecek J12] 
and Giesecke et al. [H]. It is not hard to see that when the exciting function h(-) is 
exponential, the linear Hawkes processes are affine jump-diffusion processes, see for 
instance Errais et al. [35] • With the help of the theory of affine jump- diffusions, one 
can then analyze price processes related to certain credit derivatives analytically. 

1.2.2 Applications in Sociology 

The Hawkes process has also been applied to the study of social interactions. 
Crane and Sornette [26] analysed the viewing of YouTube videos as an example of a 
nonlinear social system. They identified peaks in the time series of viewing figures 
for around half a million videos and studied the subsequent decay of the peak to 
a background viewing level. In Crane and Sornette [26], the Hawkes process was 
proposed as a model of the video-watching dynamics, and a plausible link made to 
the social interactions that create strong correlations between the viewing actions 
of different people. Individual viewing is not random but influenced by various 
channels of communication about what to watch next. Mitchell and Cates [77] 
used computer simulation to test the the claims in Crane and Sornette |26j that 
robust identification is possible for classes of dynamic response following activity 
bursts. They also pointed out some limitations of the analysis based on the Hawkes 
process. 

In sociology, Hawkes process has also been used by Blundell et al. [TU] to 
study the reciprocating relationships. Reciprocity is a common social norm, where 
one person's actions towards another increases the probability of the same type of 
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action being returned, e.g., if Bob emails Alice, it increases the probability that 
Alice will email Bob in the near future. The mutually-exciting processes, e.g. 
multivariate Hawkes processes, are able to capture the causal nature of reciprocal 
interactions. 

1.2.3 Applications in Seismology 

Ogata [87] used a particular case of the Hawkes process to predict earthquakes 
and the Hawkes process appears to be superior to other models by residual analysis. 
The specific model used by Ogata [87] is now known as ETAS (Epidemic Type 
Aftershock-Sequences) model. The discussions of ETAS model can be found in 
Daley and Vere- Jones 



1.2.4 Applications in Genome Analysis 



Gusto and Schbath [36] used the Hawkes process to model the occurences along 
the genome and studied how the occurences of a given process along a genome, 
genes or motifs for instance, may be influenced by the occurrences of a second 
process. More precisely, the aim is to detect avoided and/or favored distances 
between two motifs, for instance, suggesting possible interactions at a molecular 
level. The statistical method proposed by Gusto and Schbath [46J is useful for 
functional motif detection or to improve knowledge of some biological mechanisms. 

Reynaud-Bouret and Schbath [96] provided a new method for the detection of 
either favored or avoided distances between genomic events along DNA sequences. 
These events are modeled by the Hawkes process. The biological problem is actu- 
ally complex enough to need a non-asymptotic penalized model selection approach 
and Reynaud-Bouret and Schbath [96] provided a theoretical penalty that satisfies 
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an oracle inequality even for quite complex families of models. 

1.2.5 Applications in Neuroscience 

Chornoboy et al. [25] used the Hawkes process to detect and model the func- 
tional relationships between the neurons. The estimates are based on the maximum 
likelihood principle. 

In most neural systems, neurons communicate via sequences of action poten- 
tials. Johnson [59] used various point processes, including Poisson process, renewal 
process and the Hawkes process and showed that neural discharges patterns convey 
time-varying information intermingled with the neuron's response characteristics. 
By applying information theory and estimation theory to point processes, Johnson 
[59"] described the fundamental limits on how well information can be extracted 
from neural discharges. 

More recently, Pernice et al. [93] and Pernice et al. [91] have used Hawkes 
process to model the spike train dynamics in the studies of neuronal networks. 
As pointed out in Pernice et al. [93], "Hawkes' point process theory allows the 
treatment of correlations on the level of spike trains as well as the understanding 
of the relation of complex connectivity patterns to the statistics of pairwise cor- 
relations." Reynaud et al. [98] proposed new non-parametric adaptive estimation 
methods and adapted other recent similar results to the setting of spike trains anal- 
ysis in neuroscience. They tested homogeneuous Poisson process, inhomogeneous 
Poisson process and the Hawkes process. A complete analysis was performed on 
single unit activity recorded on a monkey during a sensory-motor task. Reynaud 
et al. [98] showed that the homogeneous Poisson process hypothesis is always re- 
jected and that the inhomogeneous Poisson process hypothesis is rarely accepted. 
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The Hawkes model seems to fit most of the data. 

The application of the Hawkes process in neuroscience has also been mentioned 
in Bremaud and Massoulie [13] , 

1.2.6 Applications in Criminology 

Hawkes processes have also been used in criminology. Violence among gangs 
exhibits retaliatory behavior, i.e. given that an event has happened between two 
gangs, the likelihood that another event will happen shortly afterwards is increased. 
A problem like this can be modeled naturally by a self-exciting point process. 
Mohler et al. [78] and Egesdal et al. [31] have successfully modeled the pairwise 
gang violence as a Hawkes process. As pointed out in Hegemann et al. [55], in 
real-life situations, data is incomplete and law-enforcement agencies may not know 
which gang is involved. However, even when gang activity is highly stochastic, lo- 
calized excitations in parts of the known dataset can help identify gangs responsible 
for unsolved crimes. The works before Hegemann et al. [S3] incorporated the ob- 
served clustering in time of the data to identify gangs responsible for unsolved 
crimes by assuming that the parameters of the model are known, when in reality 
they have to be estimated from the data itself. Hegemann et al. |55j proposed an 
iterative method that simultaneously estimates the parameters in the underlying 
point process and assigns weights to the unknown events with a directly calculable 
score function. 

Hawkes processses have also been used in the studies of terrorist activities. For 
example, Porter and White [95J used Hawkes process to examine the daily number 
of terrorist attacks in Indonesia from 1994 through 2007. Their model explains the 
self-exciting nature of the terrorist activities. It estimates the probability of future 
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attacks as a function of the times since the past attacks. 

Lewis et al. [69] used Hawkes process to model the temporal dynamics of 
violence and civilian deaths in Iraq. 

1.3 Related Models 

There are other generalizations or variations of the Hawkes processes in the 
literature. For example, Bormetti et al. [12] introduced a one factor model where 
both the factor and the idiosyncratic jump components are described by a Hawkes 
process. Their model is a better candidate than classical Poisson or Hawkes models 
to describe the dynamics of jumps in a multi-asset framework. Another example 
is a multivariate Hawkes process with constraints on its conditional density intro- 
duced by Zheng et al. |110j . Their study is mainly motivated by the stochastic 
modelling of a limit order book for high frequency financial data analysis. Dassios 
and Zhao [28] proposed a dynamic contagion process. It is basically a combination 
of a marked Hawkes process with exponential exciting function and an external 
shot noise process. Their model is Markovian. They also applied their model to 
insurance, see e.g. Dassios and Zhao [25] - 

In |115j . Zhu incorporated Hawkes jumps into the classical Cox-Ingersoll-Ross 
model and obtained limit theorems and various other properties. 

In seismology, Wang et al. |1U4] proposed a new model, i.e. the Markov- 
modulated Hawkes process with stepwise decay (MMHPSD), to investigate the 
variation in seismicity rate during a series of earthquakes sequence including mul- 
tiple main shocks. The MMHPSD is a self-exciting process which switches among 
different states, in each of which the process has distinguishable background seis- 
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micity and decay rates. Stress release models are often used in seismology. In 
Bremaud and Foss [UJ , they created a new earthquake model combining the clas- 
sical stress release model for primary shocks with the Hawkes model for aftershocks 
and studied the ergodicity of this new model. 

In addition to the classical Hawkes process, one can also study the spatial 
Hawkes process, see e.g. M0ller and Torrisi [SI], M0ller and Torrisi [S2], Bordenave 
and Torrisi [H]. In addition, the space-time Hawkes process has been used, see 
e.g. Musmeci and Vere- Jones [M] and Ogata [88J. 

1.4 Linear Hawkes Processes 

In this section, let us review some known results about linear Hawkes process. 
Unlike the nonlinear Hawkes process, the linear Hawkes process has been very well 
studied in the literature. Hawkes and Oakes [51] introduced an immigration-birth 
representation of the linear Hawkes process, which can be viewed as a special case 
of the Poisson cluster process. The stability results of the linear Hawkes process, 
i.e. existence and uniqueness of a stationary linear Hawkes process have been sum- 
marised in Chapter 12 of Daley and Vere- Jones [27]. The rate of convergence to 
equilibrium has been stuided by Bremaud et al. [IS]. The second-order analysis, 
i.e. the Bartlett spectrum etc. have been studied in Hawkes [SI] and Hawkes [52] . 
Reynaud-Bouret and Roy $7\ considered the linear Hawkes process as a special 
case of Poisson cluster process and studied the non-asymptotic tail estimates of 
the extinction time, the length of a cluster, and the number of points in an inter- 
val. Reynaud-Bouret and Roy [57] also obtained some so-called non-asymptotic 
ergodic theorems. The limit theorems have also been studied for linear Hawkes 
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process. The central limit theorem was considered in Bacry et al. [5], the large 
deviation principle was obtained in Bordenave and Torrisi [TT] . and very recently 
the moderate deviation principle was proved in Zhu |114j . The simulations and 
calibrations of linear Hawkes process have been studied in Ogata [89] , M0ller and 
Rasmussen [50] . [79] , Vere- Jones [109j . Ozaki [91] and many others. 

1.4.1 Immigration- Birth Representation 

Consider the linear Hawkes process iV with empty history, i.e. N(— oo, 0] = 
and intensity 

(1.4) \ t = v+ I h(t-u)N(du), u>0, 

Jo 

where f Q h(t)dt < 1. It is well known that it has the following immigration-birth 
representation; see for example Hawkes and Oakes [31] • The immigrant arrives 
according to a homogeneous Poisson process with constant rate v. Each immigrant 
reproduces children and the number of children has a Poisson distribution with 
parameter ||/i||z,i. Conditional on the number of the children of an immigrant, 
the time that a child was born has probability density function ,,A . Each child 
produces children according to the same laws, independent of other children. All 
the immigrants produce children independently. Now, N(0,t] is the same as the 
total number of immigrants and children in the time interval (0, t]. 
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1.4.2 Stability Results 

Consider the linear Hawkes process iV with empty history, i.e. N(—oo, 0] = 
and intensity 

(1.5) Xt = u+ / h(t-u)N(du), 

Jo 

where / °° h(t)dt < 1. We review here the known results of existence and uniqueness 
of a stationary version of the process. We follow the arguments of Chapter 12 of 
Daley and Vere- Jones [27]. 

The existence of a stationary version of the process can be seen from the 
immigration-birth representation of the linear Hawkes process. To show unique- 
ness, let us do the following. Let N* be a stationary version with intensity 

(1.6) \\ = u+ f h{t-u)N*{du), 



and mean intensity // := IE [A J] = l _\K l « — • For both N and N\ we consider the 
shifted versions 9 S N and 9 S N^ that bring the origin back to zero. 9 S N^ can be split 
into two components, the one with the same structure as 6 S N, being generated from 
the clusters initiated by immigrants arriving after time — s and the component N_ s 
that counts the children of the immigrants that arrived before time — s. On IR + , 
the contribution from the latter form a Poisson process with intensity 

(1.7) \l s (t) = I h{t-u)N ] _ s {du). 



For any T < oo, 



(1.8) 



P(iVl s (0, T) > 0) = E 1 - e~f° x -^ t)dt 



<E 



\l a (t)dt 



uo 



< \iT \ h(u)du -> 0, 



as s — > oo. Let V and P^ represent the probability measures corresponding to N 
and N\ For any T > 0, we have 



(1.9) 



\e_ s v - v^\\ [0 ,t] < P(ivi a (o,r) > o) ->■ o, 



as s — )• oo, where || • || denotes the variation norm. This implies the weak conver- 
gence and thus the weak asymptotic stationarity of N. 

Under a stronger assumption J °° th(t)dt < oo, i.e. the mean time to the 
appearance of a child is finite. Since the mean number of offspring is also finite 
(because \\h\\i,i < 1), the random time T from the appearance of an ancestor to 
the last of its descendants has finite mean, i.e. E[T] < oo. Thus, we have 



(1.10) 



p(jv!.[o, oo) > o) = i - e - u ^nT>u)du ^ ^ 



as s — > oo and \\0- s V — V^\\[o j0 o] — > as s — )• oo, which implies that the process 
starting from empty history is strongly asymptotically stationary. 

Bremaud et al. JT5] studied the rate of convergence to the equilibrium in a more 
general setting, i.e. Hawkes process with random marks. Here, we only consider 
the unmarked case. Assume iV(— oo, 0] = and let N* denote the unique stationary 
Hawkes process. The convergence in variation is seen via coupling, namely, iV and 
N< are constructed on the same space and there exists a finite random time T such 
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that 



(1.11) F(N(t, oo) = N\t, oo) for all t > T) = 1. 

In the exponential case, there exists some (3 > such that J °° e^ l h(t)dt = 1. 
Let us define 



i/ '°° 



(1.12) J/W:= __y ( A(sH , 

If e^ l H(t) is directly Riemann integrable on M + , then for any 

(1.13) K> 



J7 e#lf (t)<ft 



J™ tef*h(t)dt' 

there exists t (*0 such that P(T > t) < Ke~^ for any t > t (K). 

In the subexponential case, the distribution function G with density g{t) 
jTTjj is subexponential, in the sense that, 



1 — G* n ft) 

(1.14) lim = n, for any n G N. 

t^oo 1 — G(t) 



Further assume that J th(t)dt < oo. Then, for any 



:i.is) K>7 jm*L„. 



t.) 2 



there exists some to(K) such that for any t > t (K), we have 

G(u)du, 
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where G = 1 — G. 

1.4.3 Bartlett Spectrum for Linear Hawkes Processes 

The methods of analysis for point processes by spectrum were introduced by 
Bartlett [5j and [6j. We refer to Chapter 8 of Daley and Vere- Jones [27] for a 
detailed discussion. 

Let N be a second-order stationary point process on R. (For the definition 
of second-order stationary point process, we refer to Daley and Vere- Jones [27].) 
Define the set S as the space of functions of rapid decay, i.e. G S if 



(1.17) 



d k 4>(x) 



dx k 



< C(k,r) 



[i + \x\y 



for some constants C(k,r) < oo and all positive integers r and k. 

For bounded measurable <fi with bounded support and also <^e5, there exists 
a measure r on B such that 

(1.18) Var ( / (j){x)N{dx)\ = f \${u)\T(cko), 

where 4>{oS) = J R e tulu <j)(u)du is the Fourier transform of (p. F is refered to as the 
Bartlett spectrum. We also have 



(1.19) Cov / <j)(x)N(dx), / ^(x)N(dx) = / <f>(u)TJ>(u)r(du) 

\Jr Jr J Jm. 

Hawkes [52] proved that for the linear stationary Hawkes process with 

(1.20) \ t = v + h{t-s)N{ds), 

J — oo 
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v > and \\h\\ L i < 1, the Bartlett spectrum is given by 

(1.21) T(doj) = , du. 

2n{l - \\h\\ L i)\l - h{u)\ 2 

Moreover, if [i(t) := K[dN(t + T)dN(t)]/(dt) 2 — /x 2 is the covariance density, 
where [i := 1 _in l » — , then Hawkes [51] proved that /i(r) = //(— r), r > 0, satisfies 
the equation 



(1.22) h(t) = ixh(T) + I h(t-v)v(v)dv. 

Since ji{t) = h(—t), we have 



(1.23) /i(r) = fih(r) + / h(r + v)/j(v)dv + h(r - v)fi(v)dv, r > 0. 

Jo ./o 

In general, /x(r) may not have an analytical form. However, when h(-) is exponen- 
tial, say h(t) = «e _/3t , Hawkes [51] showed that 

w ^) = ^^^. -»■ 

The Bartlett spectrum analysis has later been generalized to marked linear 
Hawkes processes and some more general models. We refer to Bremaud and Mas- 
soulie [18] and Bremaud and Massoulie IT9l. 



1.4.4 Limit Theorems for Linear Hawkes Processes 

When A(-) is linear, say X(z) = v + z, for some v > and \\h\\i,i < 1, the 
Hawkes process has a very nice immigration-birth representation, see for example 
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Hawkes and Oakes [54] . For such a linear Hawkes process, the limit theorems are 
very well understood. Consider a stationary Hawkes process W with intensity 

(1.25) Af = z/+ I h(t-s)N\ds). 

J — oo 

Taking expecatations on the both sides of the above equation and using stationar- 
ity, we get 

(1.26) fj, := E[\\] = u + h(t-s)E[\l}ds = u + fi\\h\\ L i, 

J — oo 

which implies that \x = l \K l u — • By ergodic theorem, we have 

(1.27) > ——. — , as t — > oo a.s. 

t 1- hh.i 



Moreover, Bordenave and Torrisi [TT] proved a large deviation principle for (-^ G •). 

Theorem 1 (Bordenave and Torrisi 2007). {N t /t £ ■) satisfies a large deviation 
principle with the rate function 



x\og(^^-x + x\\h\\ L i + v ifxe{0,oc) 
(1.28) I(x) = < v L ' 

+oo otherwise 



Recently, Bacry et al. [2] proved a functional central limit theorem for linear 
multivariate Hawkes process under certain assumptions. That includes the linear 
Hawkes process as a special case and they proved that 
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Theorem 2 (Bacry et al. 2011). 

(1.29) Kt ~'^ -» aB(-), ast^oo, 

\/t 

where B(-) is a standard Brownian motion. The convergence is weak convergence 
on D[0, 1], the space of cddldg functions on [0, 1], equipped with Skorokhod topology. 
Here, 



2 v 



(1-30) //=- 7^7— and a _ 

1 - \\h\\ L i (1 - ||/i||ii) d 

Unlike the central limit theorem and the law of the iterated logarithm, there are 
not as many good crietria one can use to prove the moderate deviation principle for 
nonlinear Hawkes processes, which would fill in the gap between the central limit 
theorem and the large deviation principle. Nevertheless, due to the analytical 
tractability and birth-immigration representation of linear Hawkes process, Zhu 
[114] proved the moderate deviations for linear Hawkes processes. 

Theorem 3. Assume X(z) = u+z, v > 0, \\h\\ L i < 1 an<isup i>0 £ 3 / 2 /i(£) = C < oo. 
For any Borel set A and time sequence a[t) such that x/t <C a(t) <C t, we have the 
following moderate deviation principle. 



(1.31) - inf J(x) < liminf-^— logP ( — , ^ e A 

< limsup— — logP| f 7^ eA)<- inf J(x). 

~ ^oo F a{t) 2 6 V a{t) J ~ xga K J 



where J(x) 



* 2 (i-INI L i) 3 

1v 



The proof of Theorem [3] will be given in Appendix [Aj 



24 



In a nutshell, linear Hawkes processes satisfy very nice limit theorems and the 
limits can be computed more or less explicitly. 

1.4.5 Simulations and Calibrations 

Assume the past of a Hawkes process is known up to present time zero, say the 
configuration of the history is u~ . Let T\ be the first jump after time zero. Then, 
it is easy to see that 



(1.32) P(n >t) =e--# A " ds 



where A^ = v + ^2 Teu} - M s ~~ r )- This leads to a straight forward simulation 
method which is applicable for any simple point process. This algorithm and its 
theoretical foundation go back to a thinning procedure given Lewis and Shedler 
|7Uj . In the context of Hawkes processes, this simulation method was first used in 
Ogata [89]. It is sometimes called Ogata's modified thinning algorithm. 

If we want to simulate the stationary version of the Hawkes process on a finite 
time interval, then the standard method for the simulation method described above 
does not work as the past of the process is not known and cannot be simulated, at 
least not completely. 

If one ignores the past of the process and simply starts to simulate the process 
at some given time, one speaks about an approximate simulation. In this case, 
one is actually simulating a transient version and not the stationary version of the 
process. But if one simulates for a long enough time interval, then the transient 
version converges to the stationary one. Such an approximate simulation method 
of Hawkes processes was discussed in M0ller and Rasmussen [80]. A simulation 
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method which directly simulates the stationary version without approximation is a 
so-called perfect simulation method. The idea is to incorporate somehow the effect 
of past observations without actually simulating the past of the process. For point 
processes, this type of simulation has first been described in Brix and Kendall [20] . 
In the context of Hawkes processes, the perfect simulation method was discussed 
in M0ller and Rasmussen [79"] . 

The calibrations, i.e. the estimation of the parameters of Hawkes processes, 
was first studied in Vere- Jones [109] and Ozaki [91], based on a maximum likeli- 
hood method for point processes introduced by Rubin [55]. The properties of the 
maximum likelihood estimator was analyzed in Ogata [86]. 

In Marsan and Lengline [73], an Expectation-Maximization (EM) algorithm, 
called "Model Independent Stochastic Declustering" (MISD), is introduced for the 
nonparametric estimation of self-exciting point processes with time-homogeneous 
background rate (For linear Hawkes process with intensity X t — u t + Yl T <t ^(^ ~~ r )' 
v t is the background rate and h(-) is the exciting function). 

The efficacy of the MISD algorithm was studied in Sornette and Utkin [101], 
where the authors found that the ability of MISD to recover key parameters such as 
\\Ii\\li depends on the values of the model parameters. In particular, they pointed 
out that the accuracy of MISD improves as the timescale over which the exciting 
function h(-) decays shortens. In Lewis and Mohler [68], they introduced a Max- 
imum Penalized Likelihood Estimation (MPLE) approach for the nonparametric 
estimation of Hawkes processes. The method is capable of estimating v t and h(t) 
simultaneously, without prior knowledge of their form. Analogous to MPLE in 
the context of density estimation, the added regularity of the estimates allows for 
higher accuracy and/or lower sample sizes in comparison to MISD. 
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1.5 Nonlinear Hawkes Processes 

Consider a simple point process with intensity 
(1.33) \ t = \( [ h(t-s)N(ds) 



— oo 



where A(-) : R + — > R + and h(-) : IR + — > [0, oo). Bremaud and Massoulie 
studied the existence and uniqueness of a stationary nonlinear Hawkes process 



that satisfies the dynamics (1.33) as well as its stability in distribution and in 
variation. They allow h(-) to take negative values as well. In this thesis, we always 
consider h(-) to be nonnegative. 

The following result is about the existence of a stationary nonlinear Hawkes 



process satisfying the dynamics (1.33). We do not need A(-) to be Lipschitz. 



Theorem 4 (Bremaud and Massoulie [H]). Let A(-) be a nonnegative, nondecreas- 
ing and left- continuous function, satisfying X(z) < C + az for any z > 0, for some 
C > and a > and let h(-) : R + ->■ 1R + be such that a J °° h(t)dt < 1. Then 



there exists a stationary point process N with dynamics (1.33) 



The following results concerns the uniqueness and stability in distribution and 
in variation of a nonlinear Hawkes process. 

Theorem 5 (Bremaud and Massoulie [E]). Let A(-) be a-Lipschitz such that 
a\\h\\ L i < 1. 

(i) There exists a unique stationary distribution of N with finite average inten- 
sity E[iV(0, 1]] and with dynamics (|1.33[). 



(ii) Let e a (t) := j t _ a j R -h(s — u)N(du)ds. The dynamics (1.33) are stable 



in distribution with respect to either the initial condition (1.34) or the condition 
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(1.35) below, 



(1.34) supe a (£) < oo a.s. and lim e a (t) =0 a.e. for every a > 0, 



(1.35) supE[e a (t)] < oo and lim E[e a (t)] = for every a > 



t>o 



t— >OD 



(Hi) The dynamics (1.33) are stable in variation with respect to the initial 
condition, 



(1.36) / h(t)N[-t,0)db= I I h(t- s)N(ds) < oo, a.s. 



if we assume further that j Q th(t)dt < oo. 

Massoulie [H] extended the stability results to nonlinear Hawkes processes with 
random marks. He also considered the Markovian case and proved stability results 
without the Lipschitz condition for A(-). 

Very recently, Karabash [63] proved stability results for a much wider class of 
nonlinear Hawkes process, including the case when A(-) is not Lipschitz. 

Moreover, Bremaud et al. [15J considered the rate of extinction for nonlin- 
ear Hawkes process, that is the rate of convergence to the equilibrium when the 
stationary process is an empty process. Indeed, they considered a more general 
setting, i.e. Hawkes process with random marks. Let N be a nonlinear Hawkes 
process which is empty on (— oo, 0], i.e. iV(— oo, 0] = which satisfies the dynamics 

(1.37) A t := u{t) + <f) ( f h(t- s)N(ds) 
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where v : M + ->• M + is locally integrable, : R ->■ [0,oo), 0(0) = 0, is 1- 
Lipschitz and h : 1R + — >• R is measurable and not necessarily nonnegative and 
/ °° \h(t)\dt < 1. The unique stationary process iV° corresponding to the dynamics 



(1.38) <f>[ / h{t-s)N°(ds) 



o 

is the empty process. Assume J °° u(t)dt < oo, / °° th(t)dt < oo and t H- \h(t)\ is 
locally bounded. 

Then 6tN converges in variation to the empty process. The convergence in 
variation takes place via coupling in the sense that there exists a finite random 
time T so that, 

(1.39) F{N{t, ex)) = for any t > T ) = 1. 

Depending on whether the tail of \h(t)\ is exponential or sub exponent ail, the 
following was obtained by Bremaud et al. [15]. 

In the exponential case, let /3 > be such that f e /3t |^.(t)|<it = 1. Assume 
e^V(t) is directly Riemann integrable. Then, for any K with 

(1.40) K> J^"M* 



P J™ te^\h(t)\dt : 
there exists t (K), for any t > t (K), 

(1.41) P(T >t)< Ke~ pt . 

In the sub exponential case, assume that distribution functinon G with density 
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g(t) = rJ |fc/ii dt is subexponential, v(-) is bounded and that B = lim supj^^ =S- < 
oo, where G = 1 — G. Then for any 



(1.42) K > 



i - JZ WW 

there exists to(K) such that for any t > t , 

/oo 
G(s)ds. 

Kwiecihski and Szekli [66] considered the nonlinear Hawkes process as a special 
case of self-exciting process. Let A/"(IR + ) be the space of point processes on M + , 
which can be regarded as an element of P(IR + ), the space of functions which 
are right-continuous with left limits, equipped with Skorohod topology. For any 
/i, v e A/"(1R + ), /J -<m v if fi(B) < v(B) for any bounded set B e B(R + ). For any 
//, v £ A^(M + ), /i -<© v if and only if (/i t ) -<© (z/ t ) for the corresponding functions 
lit := n((p,t]),v t ■= K(M) e £>(M + ), i.e. /i t < i/ t for all t > 0. 

Now, for a simple point process iV with intensity X(t, N) and compensator 
A(t, iV) := J X(s, N)ds, we say that iV is positively self-exciting w.r.t. -<^ if for 
any /i, v 6 A/"(l + ), 

(1.44) /i -<^ v implies that for any t > 0, A(t,/i) < A(t, z/), 
and A/" is positively self-exciting w.r.t. -<x> if for any /i, v 6 A/"(1R + ), 

(1.45) /x -<d ^ implies that for any t > 0, A(t, /i) < A(t, i/). 



30 



Kwiecinski and Szekli [66] pointed out that if h(-) is nonnegative and A(-) 
nondecreasing, then N is positively self-exciting with respect to -<jy, and that 
if h(-) is nonnegative and nondecreasing and A(-) is nondecreasing, then iV is 
positively self-exciting with respect to -<?>. 

Let (fl, J-") be a Polish space with a closed partical ordering -<. A probability 
measure on (fi, J 7 ) is associated (-<) if 

(1.46) P&nty^PWPiCt), 

for all increasing sets C\, C2 G J 7 (a set C is increasing if x G C and x -< y implies 

Kwiecinski and Szekli [66] proved that if N is positively self-exciting point pro- 
cess w.r.t. -<n (resp. -<x>), then A/" is associated (-<j\f) (resp. (-<x>))- Therefore, it 
implies that for a nonlinear Hawkes process, if h(-) is nonnegative and A(-) nonde- 
creasing, then N is associated (-<at) and if h(-) is nonnegative and nondecreasing 
and A(-) is nondecreasing, then A^ is associated (-<©)• 

Next, let us consider the limit theorems for nonlinear Hawkes process. When 
A(-) is nonlinear, the usual immigration-birth representation no longer works and 
you may have to use some abstract theory to obtain limit theorems. Some progress 
has already been made. 

Bremaud and Massoulie [H]'s stability result implies that by the ergodic the- 



(1.47) ^-^:=E[JV[0,1]], 



as t — > 00, where E[AT[0, 1]] is the mean of N[0, 1] under the stationary and ergodic 
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measure. 

In this thesis, we will obtain a functional central limit theorem and a Strassen's 
invariance principle in Chapter [2] and a process-level, i.e. level-3 large deviation 
principle in Chapter [3] and thus a level-1 large deviation principle by contraction 
principle. We will also obtain an alternative expression for the rate function for 
level-1 large deviation principle of Markovian nonlinear Hawkes process as a vari- 
ational formula in Chapter |4| 

1.6 Multivariate Hawkes Processes 

We say N = (iVi, . . . , N^) is a multivariate Hawkes process if for any 1 < i < d, 
Ni is a simple point process with intensity 

t 



1.48) X ht := Vi + / V fnj(t - s)dN jjS 

Jo ,=i 



where z/j G R + and fry(-) : M + — )■ 1R + . Then, v := (i/ 1; . . . , v$) is a vector and 
h := {hij)\<ij<d is a <J x d matrix-valued function. 

Let us assume that for any i,j, J °° hij(t)dt < oo and that the spectral radius 
p(K) of the matrix K = J Q h(t)dt satisfies p(K) < 1. Then, Bacry et al. [2] 
proved a law of large numbers, i.e. 



(1.49) sup \\T- l N Tu - u(I - K)-V|| -> 0, 

ue[o,i] 

as T — > oo almost surely and also in L 2 (P). If we assume further that for any 
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l<i,j <d, 

POO 

(1.50) / hij{t)t 1/2 dt< oo. 

Jo 

Then, Bacry et al. [2] proved the following central limit theorem: 

(1.51) Vr(^N Tu -u(I-K)- l iy\, we [0,1] 
converges in law as T — > oo under the Skorohod topology to 

(1.52) (I - Ky^^Wu, ue[0,l], 

where £ is the diagonal matrix with Y,a = ((I — K) _1 z/)j, 1 < i < d. 

It is well known that under the assumption that p(K) < 1, there exists a 
unique stationary version of the multivariate Hawkes process satisfying the dy- 



namics (1.48). The rate of convergence to the stationary version of the multivari- 
ate Hawkes process was obtained in Torrisi |103j . The Bartlett spectrum of the 
multivariate Hawkes process was derived in Hawkes [52]. Some non-asymptotics 
estimates for multivariate Hawkes processes were obtained in Hansen et al. [49J. 
A nice survey on multivariate linear Hawkes processes can be found in Liniger 

izn- 
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Chapter 2 



Central Limit Theorem for 



Nonlinear Hawkes Processes 



2.1 Main Results 

In this chapter, we obtain a functional central limit theorem for the nonlinear 
Hawkes process under Assumption [TJ Under the same assumption, a Strassen's 
invariance principle also holds. Let us recall that N is a nonlinear Hawkes process 
with intensity 



(2.1) A 4 :=A / h(t-s)N(ds) . 

\J(-oo,t) J 

Assumption 1. We assume that 

• h(-) : [0,oo) —¥ 1R + is a decreasing function and L th(t)dt < 00. 

• A(-) is positive, increasing and a-Lipschitz (i.e. \X(x) — X(y)\ < a\x — y\ for 
any x,y) and a\\h\\ L i < 1. 
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Bremaud and Massoulie [H] proved that if A(-) is a-Lipschitz with a||/i|| L i < 1, 
there exists a unique stationary and ergodic Hawkes process satisfying the dynam- 
ics (1.2). Hence, under our Assumption HI (which is slightly stronger than [E]), 



there exists a unique stationary and ergodic Hawkes process satisfying the dynam- 



ics (1.2) 



Let P and E denote the probability measure and expectation for a stationary, 
ergodic Hawkes process, and let P(-|J r ~ 00 ) and E(-|J r "°°) denote the conditional 
probability measure and expectation given the past history. 

The following are the main results of this chapter. 

Theorem 6. Under Assumption^ let N be the stationary and ergodic nonlinear 
Hawkes process with dynamics (11. 21) . We have 



(2.2) Kt -/it -> aB(-), ast^oo, 



where B(-) is a standard Brownian motion and < a < oo, where 



(2.3) a 2 :=E[(iV[0,l]- /i) 2 ] + 2 ^E[(iV[0, 1] - »)(N\j,j + 1] - /*)]. 



The convergence in (2.2) is weak convergence on D[0, 1], the space of cddldg func- 
tions on [0, 1], equipped with Skorokhod topology. 

Remark 1. By a standard central limit theorem for martingales, i.e. Theorem^ 
it is easy to see that 

N.t — L \ s ds ^_ , , 

(2.4) — — J -j^ y y/»B(-), ast^oo, 

where fi = E[iV[0, 1]]. In the linear case, say X(z) = v + z, Bacry et al. JE/ 
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proved that a 2 in (2.3) satisfies a 2 = (1 _u^u — « > /•* = i\\i\\ -, ■ 27ia£ ^s nc ^ 



\iA 



surprising because N. t — -[it "should" have more fluctuations than N. t — f Q X s ds. 



Therefore, we guess that for nonlinear A(-), a defined in (2.3) should also satisfy 
a 2 > n = E[iV[0, 1]]. However, it might not be very easy to compute and say 
something about a 2 in such a case. 

In the classical case for a sequence of i.i.d. random variables Xi with mean and 
variance 1, we have the central limit theorem -4= Y^=i -%-i — > N(0, 1) as n — > oo, and 
we also have , ,' =1 . ' -- — y in probability as n — » oo, but the convergence does not 

V n log log n L J ° 

hold a.s. The law of the iterated logarithm says that limsup,^^ -^== = = \/1 
a.s. A functional version of the law of the iterated logarithm is called Strassen's 
invariance principle. 

It turns out that we also have a Strassen's invariance principle for nonlinear 
Hawkes processes under Assumption [TJ 

Theorem 7. Under Assumption^ let N be the stationary and ergodic nonlinear 

Hawkes process with dynamics ( |1.2[ ). Let X n := N[n — l,n] — /i, S n := J^Li-X*, 

s 2 := E[S^], g(t) = sup{n : s^ < t}, and for t G [0, 1], let T) n (t) be the usual linear 

interpolation, i.e. 

(2.5) 

Vn(t) = Sk + islt - / ^f^ k) ' 1Xk+ \ sl<slt<4 +l ,k = 0,l,...,n-l. 

Then, g(e) < oo ; {f] n ,n > g(e)} is relatively compact in C[0, 1], the set of contin- 
uous functions on [0, 1] equipped with uniform topology, and the set of limit points 
is the set of absolutely continuous functions /(•) on [0, 1] such that /(0) = and 
Jlfit) 2 dt<l. 
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2.2 Proofs 

This section is devoted to the proof of Theorem [6] We use a standard central 
limit theorem, i.e. Theorem^] In our proof, we need the fact that E[iV[0, l] 2 ] < oo, 
which is proved in Lemma [2] Lemma [2] is proved by proving a stronger result first, 
i.e. Lemma [TJ We will also prove Lemma [3] to guarantee that o > so that the 
central limit theorem is not degenerate. 

Let us first quote the two necessary central limit theorems from Billingsley [5] . 
In both Theorem [8] and Theorem [9j the nitrations are the natural ones, i.e. given 
a stochastic process (X n ) ne ^, T^ := a(X n ,a <n <b), for — oo < a < b < oo. 

Theorem 8 (Page 197 |8J). Suppose X n , n G Z, is an ergodic stationary sequence 
such that E[X n ] = and 

(2-6) ^||E[X |JTr]|| 2 <oo, 

n>l 

where \\Y\\ 2 = (EfF 2 ]) 1 / 2 . Let S n = X x + - ■ -+X n . ThenS [n .]/y/n-KTB(-) weakly, 
where the weak convergence is on D[0, 1] equipped with the Skorohod topology and 
a 2 = E[Xq] + 2 ^2'^ =1 E[X X n ]. The series converges absolutely. 

Theorem 9 (Page 196 jS]). Suppose X n , n G Z, is an erogdic stationary se- 
quence of square integrable martingale differences, i.e. a 2 = ELY 2 ] < oo, and let 
ELY n |.7 : '~_^[ , ] = 0. Let S n — X\ + ■ ■ ■ + X n . Then S[ n .]/y/n — > crB(-) weakly, where 
the weak convergence is on D[0, 1] equipped with the Skorohod topology. 

Now, we are ready to prove our main result. 

Proof of Theorem^ Since in the stationary regime, E,[N[n,n-\- 1]] = E[iV[0, 1]] for 
any neZ and let us denote E[iV[0, 1]] = \x. In order to apply Theorem pi let us 
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first prove that 



00 1/2 

(2.7) ^{E[(E[iV(n,n + l]- / i|J- - 00 ]) 2 ]} " 



< 00. 



n=l 



Let E w i [JV(n, n + 1]] and E ^ [7V(n, n + 1]] be two independent copies of E[N(n, n + 
ljl^o" 00 ]- It is easy to check that 



(2.8) 



-E 



-E 



E w i [JV(n, n + 1]] - E^ [N(n, n + 1]] 
E<^[7V(n,n+l]] 2 



+ -E 
2 



E 



E^ [7V(n,n + l]] 2 
E^ [N(n, n + 1]]E^ [N{n, n + 1]] 



E[E[%n + l]|7^°f] -11 2 
E[(E[iV(n,nH-l]-^|Jo _00 ]) 2 ]- 



Therefore, we have 



(2.9) 



E^EfiV^n + l]-/^- 00 ]) 2 ] 
1, 



2 

< E 



-E 



E"i [iV(n, n + 1]] - E^ [JV(n, n + 1]] 



E w i [JV(n, n + 1]] - E [JV(n, n + 1]] 



+ E 



E w 2 [JV(n, n + 1]] - E [N{n, n + 1}} 



2E 



E w i [JV(n, n + 1]] - E [iV(n, n + 1]] 



where E [iV(n, n + 1]] denotes the expectation of the number of points in (n, n + 



1] for the Hawkes process with the same dynamics (1.2) and empty history, i.e. 
JV(-oo,0] = 0. 
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Next, let us estimate E w i [N(n,n + 1]] - E [N(n,n + 1]]. E w i [N(n,n + 1]] is 
the expectation of the number of points in (n, n + 1] for the Hawkes process with 
intensity A* = A (J2 T -Teuj~uw\ot) M* — r ) )• ^ is wen defined for a.e. u;j~ under P 
because, under Assumption [TJ 



(2.10) E[A t ] < A(0) + aE 



h(t - s)N(ds) 



X(0) + a\\h\\ L iE[N[0, 1]] < oo, 



which implies that X t < oo P-a.s. 

It is clear that E^ 1 [A^(n, n+ 1]] > E [iV(n, n + 1]] almost surely, so we can use 
a coupling method to estimate the difference. We will follow the ideas in Bremaud 
and Massoulie [TJ] using the Poisson embedding method. Consider (fi, J 7 , V), the 
canonical space of a point process on IR + x IR + in which N is Poisson with intensity 
1 under the probability measure V. Then the Hawkes process N° with empty past 
history and intensity A° satisfies the following. 



(2.11) 



X °t = X (/ (0 ,t) K* ~ s)N°(ds)) t e K+ 

N°(C) = f c N(dtx[0,\°]) CeB( 



For n > 1, let us define recursively A™, D n and iV n as follows. 



(2.12) 



A r = A (/ (0>t) hit - s)N-\ds) + £ rew - hit 

D n {C) = j c N(dtx[Xr\X7]) 
N n {C) = N n -\C) + D n {C) 



t eR + 
C eBi 
C eBi 



Following the arguments as in Bremaud and Massoulie [Hj, we know that each A 



is an J-^-intensity of N n , where J 7 ^ is the a-algebra generated by N up to time 
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t. By our Assumption [IJ A(-) is increasing, and it is clear that X n (t) and N n (C) 
increase in n for all t e IR + and C G i3(IR + ). Thus, D n is well defined and also 
that as n — > oo, the limiting processes Xt and iV exist. N counts the number of 
points of N below the curve t i->- A* and admits A 4 as an J-^-intensity. By the 
monotonicity properties of A" and N n , we have 

(2.13) A™ < A f / hit - s)N(ds) + V hit - r) , 

V m re* J 

(2.14) \t>\\ [ h(t-s)N n (ds)+y2h(t-T)\ . 

Letting n — > oo (it is valid since we assume that A(-) is Lipschitz and thus continu- 
ous), we conclude that N, \ t satisfies the dynamics ( |1.2 ). Therefore, with intensity 
A t , N = A^ + X)i=i Di is the Hawkes process with past history wf. 
We can then estimate the difference by noticing that 

oo 

(2.15) E w i"[JV(n,n + l]] - E [N(n,n + 1}] = ^¥?[D l {n,n + I}}. 

1=1 

Here E^ means the expectation with respect to V, the probability measure on the 
canonical space that we defined earlier. 
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We have 



(2.16) W[D!(n,n + l]] 

cn+1 



E 



v 



{\\t) - X°(t))dt 



E 



v 



n+l 



r<t,rGA f0 Uw~ 



r 



A ( J^ fc(t - r) 1 (//- 

r<t,reA rO U0 



/•n+l 
J n _ 



where the first equality in (2.16) is due to the construction of D\ in (2.12), the 
second equality in (2.16) is due to the definitions of A 1 and A in (2.12) and finally 



the inequality in (2.16) is due to the fact that A(-) is a-Lipschitz. Similarly, 



(2.17) E F [D 2 (n,n + l]]<E u 



a 



/•n+l 

/ E K* - T ) dt 



T6Dl,T<t 
r-n+1 /•* 



/•n+l /•! 

- 5Z a / h (t~ S )H S - r)dsdt. 

Jn JO 



Iteratively, we have, for any fceN, 



E v [D k (n,n + l]}< ^ 



t£w. 



n+l /•**. rt 2 

'>" / / ••■/ h{tk ~ tk-l)h{tk-l ~ tk-2) 

n JO Jo 



■ ■ ■ h(t 2 - t 1 )h(t 1 - r)dt 1 ■ ■ -dt k =: ^ K k (n, r). 
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Now let K(n,r) := Ylh=i Kk(n, r). Then, 



(2.18) 



E 



< E 



< E 



E w i [N(n, n + 1]] - E [iV(n, n + 1]] 

^ K(n,i)K(n,j)N[i,i + l]N\j,j + 1] 
t,i<o 

= J2 K(n,i)K(n,j)E[N[i,i + l]N\j,j + 1]] 

i,j<0 

< J2 K(n,i)K{n, 3 ) 1 - {E[N[i,i + l] 2 ] + E[N\j,j + l] 2 ]} 

i,j<0 

= E[iV[0,l] 2 ](^K( 



n,« 



. i<0 



Here, E[iV[0, l] 2 ] < oo by Lemma ^1 Therefore, we have 



(2.19) 



oo 

J2 {E [{E[N(n,n + 1] - /x|^°°]) 2 ] } 

n=\ 

oo 

<^2EiN{0,l]*]J2J2 K ( 

n=l 
oo 

< y/2E[N[0, l] 2 ] ]T 



1/2 



ra=l i=— oo 



n,« 



fc=i 



OO /"t 



./0 



./-oo 



/i(t fc - t k ~i)h(t k -i - tk-2) ■ ■ ■ h(t 2 - h)h(h - s)dsdti ■ ■ ■ dt k . 



Let H(t) := J t °° h(s)ds. It is easy to check that J" °° H(t)dt = J °° th(t)dt < 00 by 
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Assumption [TJ We have 



(2.20) 



a 



oo rt k i>t2 rO 



JO JO J -oo 



h(t k - t k -i)h(t k -i - 4-2) • ■ ■ h(t 2 - h)h(ti - s)dsdt 1 ■ ■ ■ dt k 

foo rtfc t*t2 



a 



a: 



h(t k - tk-i)h(tk-i - tk-2) ■ ■ ■ h(t 2 - tx)H{ti)dti ■■ -dt k 



jo jo 

00 poo poo 



/•oo /»oo /»oo 

/ ■■• / / h(tk-tk-i)dt k h(tk-i-tk-2)dt k - 1 ---H(t 1 )dti 

JO Jt k _ 2 Jt k _ 1 



L 1 



a k \\h\\ k T - x I H(t 1 )dt 1 = a k \\h\\ k - 1 l I th(t)dt. 



Since a||/i||ii < 1, we conclude that 



00 1/2 

(2.21) Y, { E [( E Mn, n + l)-»\ F^]) 2 ]} " 



n=\ 



< 



J2^/2E[N[0,l}*]a k 



fe=i 



k Ll l I th(t)dt 




V / 2E[Aqo7Tp] • - 



ex 



a\\h\\ a 



L 1 Jo 



th(t)dt < 00. 



Hence, by Theorem [8l we have 



(2.22) 



N[.t] ~ fJ.[-t] 

Vi 



— y crB(-) as t — > 00, 



where 



(2.23) a 2 = E[(N[0, 1] - /i) 2 ] + 2 ^ E[(JV[0, 1] - ^)(iV[j, j + 1] - //)] < 00. 
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By Lemma |3j a > 0. Now, finally, for any e > 0, for t sufficiently large, 



(2.24) 



P( sup 

v 0<s<l 



N [st ] - fi[st] N st - fist 



Vi Vt 

= P f sup \(N [st] - N st ) + fi{st - [st})\ > eVi 

\0<s<l 

< P ( sup \N [st] -N st \+fi> eVi) 

\0<s<l / 

<P( max N\k,k + 1] > eV~t - n ) 

\0<k<[t],k& J 

<([t] + l)P(iV[0,l] >eVt-fi) 
^ [t] + 1 



{eVt-fif 



N[0,l]>eVi-n 



N[0, l] 2 dP -». 0, 



as t — > oo by Lemma 2 Hence, we conclude that — ^ — > crB(-) as t — V oo. D 



The following Lemma [T] is used to prove Lemma [2j 



Lemma 1. There exists some 9 > suc/i t/iat sup t>0 E £ 



,f* 0h(t-s)N{ds) 



< OO. 



Proof. Notice first that for any bounded deterministic function /(• 
(2.25) exp | ! f(s)N(ds) - f (e /(s) - l)A(s)ds j 



is a martingale. Therefore, using the Lipschitz assumption of A(-), i.e. X(z) < 
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A(0) + az and applying Holder's inequality, for - + - = 1, we have 



(2.26) 



E k 
= E ; 
< E 
<E 



e f 9h(t-s)N(ds) 

' e f* 0h(t- S )N(d S )-± / V«M*-s)_i)A( s )ds+I J^eP 6h (*-^-l)X( S )ds 



e S/*(e^*f*-)-l)AWd« 

' gf / V flh(t - s) -l)(A(0)+a /„* h( S -u)JV(d U ))ds' 



< 



E r e J?f(e p * h( *- ) -l)«/o''»(»-«)JV(d«)d»l « . e ^/o°°(^ h(s) -l)A(0)d,_ 



Let C(£) = J 2(eP* h (*-») - l)ads. Then, for any £ e [0,T], 



(2.27) 



E* 



= E 
<E 



ei /Ji(eP«fc(i-.)_l)a/ *M«-«)JV(A.)«fa 

e ^J?f(« p " i( *- ) -l)«C(*)/o''(»-«W«'«)^* 



1 /" l^pBHt-s) _ l)ae C{t)J°h{s-u)N{du) dt 



< sup E £ 

0<s<T 



D C(oo) J 3 h(s-u)N(du) 



where in the first inequality in (2.27), we used the Jensen's inequality since x \- > e a 



is convex and ^y f Q ^{eP dh ^ s ) — I) ads = 1, and in the second inequality in (2.27), 
we used the fact that C(t) < C(oo) and again ^y Jj 2( e P^(t- s ) _ ^ ads = 1 Now 
choose q > 1 so small that ga||/i||xi < 1. Once p and q are fixed, choose 9 > so 
small that 



(2.28) 



C(oo) 



!t e p8h{s) 



P 



l)ads < 6>. 
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This implies that for any t G [0, T], 



(2.29) E e 



J*6h(t-s)N{ds) 



< sup E £ 

0<s<T 



)f*h(s-u)N(du) 



" . p i/ °°(e' 9ft W-l)A(0)d S 



Hence, we conclude that for any T > 0, 



(2.30) 



sup E e 

0<t<T 



o ft h(t-a)N(ds) 



< e r(e' m W-l)A(0)d S 



< oo. 



D 

Lemma 2. T/iere exzsfo some 6 > suc/i t/jaiEfe 61 ^ ' 11 ] < oo. i/ence E[iV[0, l] 2 ] < 
oo. 

Proof. By Assumption [IJ /i(-) is positive and decreasing. Thus, 5 = inf t6 [ 0j i] h(t) > 
0. Hence, 



(2.31) 



E 0|- e (9JV[t-l,t]j < E 0[ e f jjA(t-«)JV(«ta)]_ 



By Lemma [T], we can choose 6 1 > so small that 



(2.32) 



limsupE'V JV[ *- 1 ' tl ] <oo. 



t— s>oo 



Finally, E^ ' 1 !] < liminf^ooE ^*" 1 '*]] < oo. 



D 



It is intuitively clear that a > 0. But still we need a proof. 



Lemma 3. a > 0, where a is defined in (2.23). 



Proof. Let Vn = £°I n E[JV(j, j + 1] - fi\F-£], where /i = E[N[0, 1]]. r? n is well 
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defined because we proved (2.7). To see this, notice that 



(2.33) 



\Vn\\2 



-00] 
n+lJ 



j=n 

X) 

<^||E[iV(i,iH-l]-^--]|| 2 <;- x , 



3=n 



by (2.7). Also, it is easy to check that 



(2.34) E[77„+i -r) n + N(n, n + 1] - v\K 



— OO] 

+lj 



E 



E 



£ E[iVO-,j + l]-^ 



00 1 

n+2j 



j'=n+l 



-77—00 



^E[iY(j,j + l]-/i|J- r 



001 

n+lJ 



j=n 



X"— 00 
•'n+l 



+ iV(ra,n+l] -// 



]T E[iV(j, j + 1] - ix\T~^ - Y. E l N V>3 + ^ " ^ 

j'=n+l j'=n+l 

JV(n, n + 1] + n + N(n, n + 1] - // = 0. 



—00] 
+1J 



Let y n = 77„_i — 7] n -2 + iV(n — 2,n — 1] — \x. This is an ergodic, stationary 
sequence such that E[y„|J"~^] = 0. By Q, E[F„ 2 ] < 00 and by Theorem g 
S'rJ^/ri —?• a'B(-), where S^ = YTj=i ^i- ^ * s cl ear that cr = cr' < 00 since for any 
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e> 0, 



(2.35) P ( max -= V(^-i - 7fr_ 2 ) > e ) 

I i<K[n],tez v m*-' WJ y 

= P I max (r/fc_i - r/_i) > ey^ ) 

\l<fc<[n],ksZ / 



<P max |%_i| > 



e\/n 



_ l<fc<[n],fesZ 2 

I" I 



}U{M>^} 



<E p (i^i>^#)+ p (i^i>^) 

4(W + 1) /" l9 , 

e 2 ™ -Vi|>^# 



as n — > oo, where we used the stationarity of P, Chebychev's inequality and (2.7) 
Now, it becomes clear that 

(2.36) a 2 = (a') 2 = E[Y 2 ] 

= E(r/o-7M + A r (-l,0]-/i) 2 



i 2 

E ( ^E[iV(j, j + 1] - /1IJT 00 ] - ^E[iV(j, j + 1] - /i|J- 
o=o i=0 



— ool 
'0 J 



Consider D = {lu : u~ ^ 0,w(O,l] = 0}. Notice that P(w~ = 0) = 0. By 
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Jensen's inequality and Assumption [TJ we have 

(2.37) P(D) = /V~(iV(0, 1] = O)P(dw-) 

" e - Jo A(£. ew - h(t-r))dtl 

-E / A ( J2 h tt ~ T ) ) dt 

> exp I -A(0) - aE / ^2 h{t - r)dt I 

> exp {-A(0) - aE[iV[0, 1]] ■ \\h\\ L i] > 0. 



It is clear that given the event D, 



— oo] 
J 



(2.38) J>[iV(j, j + 1] - /i| JT°°] < J>[iV(j, j + 1] - ii\F { 

i=o i=o 

Therefore, 



(2.39) P J>[iV(j, j + 1] - fi\ JT°°] ^ $>[7V(j, j + 1] - /i| JT — ] > 0, 

which implies that a > 0. □ 

Proof of Theorem \7\ By Heyde and Scott [57] , the Strassen's invariance principle 
holds if we have (12.71) and a > 0. □ 
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Chapter 3 

Process-Level Large Deviations 
for Nonlinear Hawkes Processes 



3.1 Main Results 

In this chaper, we prove a process-level, i.e. level-3 large deviation principle 
for nonlinear Hawkes processes. As a corollary, a level-1 large deviation principle 
is obtained by a contraction principle. 

Let us recall that N is a nonlinear Hawkes process with intensity 



(3.1) A t :=A( / h(t-s)N(ds] 

'(-oo,t) 



Throughout this chapter, we assume that 



The exciting function h(t) is positive, continuous and decreasing for t > 
and h(t) = for any t < 0. We also assume that f Q h(t)dt < oo. 



X(z) 



The rate function A(-) : [0, oo) — > 1R + is increasing and lim^oo -^ = 0. We 
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also assume that A(-) is Lipschitz with constant a > 0, i.e. |A(x) — X(y)\ < 
a\x — y\ for any x, y > 0. 

Let Q be the set of countable, locally finite subsets of K and for any u G Q 
and ACM, write uj(A) := u fl A. For any { 6 M, we write u(t) = u({t}). Let 
./V(A) = #|(X>n/i| denote the number of points in the set A for any icl. We also 
use the notation N t to denote iV[0,£], the number of points up to time t, starting 
from time 0. We define the shift operator 6 t by 9t(u>)(s) = u>(t + s). We equip the 
sample space Q with the topology in which the convergence tu n — > lu as n — V oo is 
defined by 

(3.2) E/( r )^E/( r )> 

for any continuous / with compact support. 

This topology is equivalent to the vague topology for random measures, for 
which, see for example Grandell [15]. One can equip the space of locally finite 
random measures with the vague topology. The subspace of integer valued random 
measures is then the space of point processes. A simple point processes is a point 
process without multiple jumps. The space of point processes is closed. But the 
space of simple point processes is not closed. 

Denote J 7 / = a(cu[s,t]) for any s < t, i.e. the cr-algebra generated by all the 
possible configurations of points in the interval [s,t\. Denote M(Q) the space of 
probability measures on Q. We also define A^5(fi) as the space of simple point 
processes that are invariant with respect to 9 t with bounded first moment, i.e. for 
any Q G Ms(ty, E Q [iV[0,l]] < oo. Define M E (ty as the set of ergodic simple 
point processes in .Ms(f2). We define the topology of JAs{&) as follows. For a 
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sequence Q n in Ai s (fl) and Q G A^g(fi), we say Q n — ^ Q as n — > oo if and only if 

(3.3) JfdQn^JfdQ, 
as n — > oo for any continuous and bounded / and 

(3.4) [ N[0,l]{w)Q n {dw) -> f N[0,l]{u)Q{dw), 



as n — 7- oo. In other words, the topology is the weak topology strengthened by the 
convergence of the first moment of N[0, 1]. For any Qi, Q 2 in Aig(Q), one can 
define the metric d(-,-) by 

(3.5) d(Q h Q 2 ) = d p (Q h Qt) + |E Ql [N[Q, 1]] - E Qa [N[0, 1]] | , 

where d p (-, •) is the usual Prokhorov metric. Because this is an unusual topology, 
the compactness is different from that in the usual weak topology; later, when we 



prove the exponential tightness, we need to take some extra care. See Lemma 21 



and (m) of Lemma 20 



We denote by C(O) the set of real-valued continous functions on f2. We sim- 
ilarly define C(Q xl). We also denote by B{J^°°) the set of all bounded T^°° 
progressively measurable and T^ 00 predictable functions. 

Before we proceed, recall that a sequence (P n ) ne N of probability measures on a 
topological space X satisfies the large deviation principle (LDP) with rate function 
/ : X — > R if I is non-negative, lower semicontinuous and for any measurable set 
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-4. 



(3.6) — inf I(x) < liminf — \ogP n (A) < limsup — \ogP n (A) < — inf_ J(x). 

XEA° n->oo n ' ' „-s>oo Tl ' x&A 



Here, A° is the interior of A and A is its closure. See Dembo and Zeitouni 

or Varadhan |106j for general background regarding large deviations and their 

applications. Also Varadhan |107j has an excellent survey article on this subject. 

In the pioneering work by Donsker and Varadhan [31], they obtained a level-3 
large deviation result for certain stationary Markov processes. 

We would like to prove the large deviation principle for nonlinear Hawkes pro- 
cesses by proving a process-level, also known as level-3 large deviation principle 
first. We can then use the contraction principle to obtain the level- 1 large deviation 
principle for (N t /t G •). 

Let us define the empirical measure for the process as 

1 /"* 
(3.7) R^(A) = - / XA{O a oo t )ds, 

t Jo 

for any A, where u t (s) = u)(s) for < s < t and u t (s+t) = u t (s) for any s. Donsker 
and Varadhan [31] proved that in the case when Q is a space of cadlag functions 
u(-) on — oo < t < oo endowed with Skorohod topology and taking values in a 
Polish space X, under certain conditions, P 0,x (Rt yUJ G •) satisfies a large deviation 
principle, where P°' x is a Markov process on Q^ with initial value x G X. The 
rate function H(Q) is some entropy function. 

Let h(a,/3)s be the relative entropy of a with respect to /3 restricted to the 
cr-algebra E. For any Q G A^s(O), let Q^ be the regular conditional probability 
distribution of Q. Similarly we define P w . 
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Let us define the entropy function H{Q) as 



(3.8) 



H{Q)=-E9[h(qr,P u -)j$]. 



Notice that P w describes the Hawkes process conditional on the past history 
uj~ . It has rate MSreajfOs)^- M s — r )) & t time < s < 1, which is well de- 
fined for almost every u>~ under Q if E^[iV[0, 1]] < oo since E^[^ rew _ h(— r)] = 
||fc|| L iE«[JV[0,l]] < oo implies £ T£w - fc(« - r) < Ereu,- K~r) < oo for all 
< s < 1. 

When if (Q) < oo, /i(Q w , P u ) < oo for a.e. u~ under Q, which implies that 
quj ^ pw on jro By ^g theory of absolute continuity of point processes, see 
for example Chapter 19 of Lipster and Shiryaev [72] or Chapter 13 of Daley and 
Vere- Jones [27], the compensator of Q u is absolutely continuous, i.e. it has some 
density A say, such that by the Girsanov formula, 



(3.9) H{Q) 



o 



(X-X)ds + [ log{X/X)dN t 
^ ' Jo 

X(u), s) - X(u), s) + log - . ' { I Xds 

\X(u),s) 



dQ" Q(duj- 
Q(du), 



where A = A ( £ TeaJ r s)uu- M s ~~ T ) ) ■ Both A and A are J 7 " 00 -predictable for 
< s < 1. For the equality in ( 3.9[ ), we used the fact that N t — J Q X(u,s)ds is 
a martingale under Q and for any f(u, s) which is bounded, J 7 ' 00 progressively 
measurable and predictable, we have 



(3.10) 




f(u,s)dN s Q(du) 



n Jo 




f(ui, s)X(u), s)dsQ(dcu). 



n Jo 
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We will use the above fact repeatedly in this chapter. 

The following theorem is the main result of this chapter. 

Theorem 10. For any open set G C Ais{Q), 



(3.11) liminf- log P(R tui eG)>- inf H(Q), 

t— >oo t Q£G 



and for any closed set C C Ais(Q), 



(3.12) lim sup- log P(R t ^ eC)<- inf H(Q). 

t—too t Q&C 



We will prove the lower bound in Section 3.2, the upper bound in Section 



3.3| and the superexponential estimates that are needed in the proof of the upper 
bound in Section 13.41 

Once we establish the level-3 large deviation result, we can obtain the large 
deviation principle for (N t /t e •) directly by using the contraction principle. 

Theorem 11. (Nt/t G ■) satisfies a large deviation principle with the rate function 
/(•) given by 

(3.13) I(x) = inf H(Q). 

QeMs(n),EQ[N[0,l]}=x 

Proof. Since Q \-¥ K Q [N[0,1]] is continuous, J n N[0, l]dR t ^ satisfies a large de- 
viation principle with the rate function /(•) by the contraction principle. (For a 
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discussion on contraction principle, see for example Varadhan |lU6j .) 



(3.14) / N[Q : l]dR t ^ = j j N[0,l}(6 s u t )ds 



t-i 



N[s,s + l]{u)ds + - N[s,s + l](u t )ds 



t-\ 



Notice that 



(3.15) 0<-/ N[s,s + l]{u t )ds<-{N[t-l,t]{uj)+N[0,l\{uj)), 



t-\ 



and 



(3.16) 



t-i 



7V[s,s + l](u)ds 



N s (u)ds - / N s (oo)ds 
t-i Jo 



< 



N t 



and 



(3.17) 



t-i 



N[s,s + l](u)ds > 



N t -i - Nt N t Nlt-l^ + Nt 



Hence, 



(3 lg) N, _ Aflt-Ml + iV, £ J m 1]dRta < N, + ^-M] + JV, 

t t Jn t t 



For the lower bound, for any open ball B e (x) centered at x with radius e > 0, 



(3.19) P (y e B,(i)J > P ( / JV[0, !]<«?,,„ 6 B, /2 (i)) 



■j V [i- Ml> A_ /iV 1> e 

£ - 4/ V t - 4 
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For the upper bound, for any closed set C and C e = [J xeC B e (x 



(3.20) P (^ e C J < P ( f N[Q, l]dR, M e C'j 



1 t ~ A V t ~ 4 



Finally, by Lemma 16 , we have the following superexponential estimates 



. , 1 n fN\t-l,t] e\ . 1, n (N x e 

(3.21) limsup - logP I > — I = limsup - logP I — > - 

t->oo t V t 4/ t_i>oo t \ t 4 



-oo. 



Hence, for the lower bound, we have 



(3.22) 



liminf-logpf — G BJx) ) > -I(x), 

t-Hx> t \ t 



and for the upper bound, we have 



(3.23) 



limsup - logP ( — 6 C 1 < — inf I(x), 



which holds for any e > 0. Letting e 4- 0, we get the desired result. 



□ 



3.2 Lower Bound 



Lemma 4. For any A, A > 0, A — A + Alog(A/A) > 0. 



Proof. Write A - A + A log(A/A) = A (A/A) - 1 - log(A/A) . Thus, it is sufficient 
to show that F(x) = x — 1 — log x > for any x > 0. Note that P(0) = P(oo) = 
and F'(x) = 1 — - < when < x < 1 and F'(x) > when x > 1 and finally 
P(l) = 0. Hence F(x) > for any x > 0. D 
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Lemma 5. Assume H(Q) < oo. Then, 

(3.24) E Q [N[0,1]]<C 1 + C 2 H(Q), 

where Ci,C 2 > are some constants independent ofQ. 

Proof. If H(Q) < oo, then /i(Q w , P w )jro < oo for a.e. u~ under Q, which implies 
that Q u <C P^ and thus A t ^ A t , where A t and A t are the compensators of N t 
under Q^ and P w respectively. (For the theory of absolute continuity of point 
processes and Girsanov formula, see for example Lipster and Shiryaev [72J or Daley 
and Vere- Jones [27].) Since A t = f Q X(co, s)ds, we have A t = L X(u>, s)ds for some 
A. By the Girsanov formula, 

(3.25) H(Q) =E Q f A - A + log (X/x) Xds . 

Notice that E Q [N[0, 1]] = / £ XdsdQ. 

(3.26) I f XdsdQ <e I I ^h{s- r)dsdQ + C t 
J Jo J Jo T<s 

e f h(0)N[0, l]dQ + e [Yl h (~ r ) d Q + Ce 



J J T<0 

e(h(0) + \\h\\ L i)E Q [N[0,l]] + C e 
e(/i(0) + \\h\\ L i) / / XdsdQ + C e . 



Therefore, we have 



(3.27) / / X ■ l x<KX dsdQ < Ke{h{0) + \\h\\ L i) If XdsdQ + KC e 
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On the other hand, by Lemma [4j 



(3.28) 




H(Q) > I I A - A + A log(A/A) • h >KX dsdQ 




> (logic -1) / / A • l~ x>KX dsdQ. 



Thus, 



(3.29) 




XdsdQ < Ke(h(0) + \\h\\ L i] 




XdsdQ + KC € + HlQ) 



o 



log If -1 



Choosing K > e and e < 



i 



K(h(0) + \\h\\ L i)' 



we get 



(3.30) E Q [iV[0, 1]] < 



KC f 



H{Q) 



l-ife(/i(0) + ||/i||Li) (\ogK-l)Ke{h(Q) + \\h\\ L i) 



n 



Lemma 6. We have the following alternative expression for H(Q). 



(3.31) H(Q) 



sup E 

/(w,s)eB(J : - s -° )nc(nxR),o<s<i 



Q 



I \(l-e f )ds+ f fdN s 
Jo Jo 



Proof. E^[iV[0, 1]] < oo implies that E^" [^[0, 1]] < oo for almost every ui under 
Q, also £ t6w _/i(-t) < oo since E«[£ T6w _ h(-r)} = \\h\\ L iEQ[N[0,l}} < oo. 
Thus, 



(3.32) E p " [N[0, 1}} = E 1 



A I J2 h(s-r)\ds 



reoj[0,s)Uoj- 

<C e + eh{0)E puJ ~ [N[0, l]] + eJ2 h (~ T ) <: ' y - 

t£lu~ 
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so E p " [JV[0, 1]] < oo by choice of e < ^4y. 

By the theory of absolute continuity of point processes, see for example Chapter 
13 of Daley and Vere- Jones [27J, if E Q "~ [N[0, 1]],E P ^ [N[0, 1]] < oo, Q u ~ < P^ 
if and only if A t <C A t , where A t and A t = J X(u~,u), s)ds are the compensators 
of N t under Q u and P u respectively If that's the case, we can write A t = 
J \(oj~ ,u, s)ds for some A and there is Girsanov formula 



(3.33) 



log 



dQ u 



dP- 



j-o JO 



(X-X)ds + / log [X/X)dK 



which implies that 



(3.34) 



H(Q) = E Q 



A -AH- log (A/A) Xds 



For any /, A/ + (1 — e^)A < A log(A/A) + A — A and the equality is achieved when 
/ = log(A/A). Thus, clearly, we have 



(3.35) sup E Q 

/(w,s)eB(J r - 00 )nC(nxR),o<s<i 



X(l-e f )ds + / fdN t 



< H{Q). 



On the other hand, we can always find a sequence f n convergent to log(A/A) and 
by Fatou's lemma, we get the opposite inequality. 

Now, assume that we do not have Q u <C P w for a.e. u~ under Q. That 
implies that H{Q) = oo. We want to show that 



(3.36) sup 2 

/(w,s)66(j s -°°)nc(nxK),o<s<i 



Q 



f A(l - e f )ds + f 
Jo Jo 



fdN s 



oo. 
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Let us assume that 



(3.37) sup E Q 

/(w,s)eB(J r 7°°)nc*(nxR),o<s<i 



/ X(l-e f )ds + f 
Jo Jo 



fdN s 



< oo. 



We want to prove that H(Q) < oo. 

Let P^ be the point process on [0, 1] with compensator A t + eA t . Clearly 
A t < A t + eA t and Q^ < Pf. 

For any /, 



(3.38) E Q 



[ (l- e f)d(A s + eA s ) + fdA s 
Jo 



= E Q 
+ E Q 
< E Q 
= E Q 



'! - e f )Xf<od{A s + tA s ) + fXf<odA s 



/ (1 - e / )x/>o^(^ + e^ s ) + fXf>odA s 
Jo 



d(A s + ei s 



rf(A s + ei s 



E Q 



(! - e f )xf>odA s + fxf>odA s 



[ (1 - e^°)<L4 s + /x/>orfA 
vo 



<C s + S(h(0) + \\h\\ L i)E^[N[0,l}] 



+ sup E c 

/(cj,s)eB(J r 7°°)nc*(QxR),o<s<i 



/ A(l - e f )ds + I fdN s 
Jo Jo 



< oo. 
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Therefore, 



(3.39) 



oo > liminf sup E^ 

^° f(u,s)eB(T-°°)r\C(QxM.),0<s<l 



lim inf sup 

^° /(w,s)eB(J r s "° )nc*(nxR),o<s<i 



/ (1 - e f )d(A s + eA a ) + fdA s 
Jo 



dA s 



1 - e f + / • 

(, \ d(A s + eA s ) 



d{A s + eA t 



liminfE Q [fc(Q < " ,P? )*>] 

40 l e lJ 

E^[h(Q^,P^) T o] = H(Q), 



by lower semicontinuity of the relative entropy h(-, •), Fatou's lemma, and the fact 
that Pf ->■ P^ weakly as e 1 0. Hence #(Q) < oo. D 

Lemma 7. H(Q) is lower semicontinuous and convex in Q. 



Proof. By Lemma [6j we can rewrite H(Q) as 



(3.40) H{Q) = sup E c 

/(cd,s)eB(.F~°°)nC(nxR),0<s<l 

= sup K c 

f(u ! s)eB(J 7 r oo )nC(UxM),0<s<i 



A(l-e / ) + A/rfs 



i ,i 

A(l-e / )ds + / /diV s 



If g n ->■ Q, then E Q ™[iV[0, 1]] -)> E Q [iV[0, 1]] and Q n ->■ Q weakly. Since /(w, s) G 
C(fi x R) fl B(T~°°), J Q f(oj,s)dN s is continuous on f2, and since / is uniformly 
bounded, /„* f(u,s)dN 8 < \\f\\ L <*>N[0,l]. Hence, 



(3.41) 



E c 



f(u,s)dN s 



^E c 



f(u,s)dN a 
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Let X M = A(£ r<s /i M (s-r)), where h M (s) = h(s) Xs <M- Then, X M (u,s) E 
C(0 x E) and thus /„ A M (1 - e^' s ))cfe G C(0). Also, /„ A M (1 - e^^)ds < 
K(l + e"f" L °°)N{— M, 1], where K > is some constant. Therefore, 



(3.42) 



E L 



X M (1 - e f ^ s) 



)ds 



^K- 



\ M (l - e^ 



)ds 



as n — > oo. Next, notice that 



(3.43) 



E 



Q 



A A/ (1 _ e f(^ di 



E c 



/ A(l - e f(ui ' s) )ds 
Jo 



poo 

<E Q (l + e !l/llLOO )aE Q [iV[0,l]] / h(s)ds ->■ 



as M — )■ oo. Similarly, we have 
(3.44) 



lim sup lim sup 

M— >oo n—too 



Hence, 



E Cl 



A M (l-e^K< 



uo 



-E Qn 



A(l - e /(w ' s) )df 



(3.45) E Qn 



X(u,s)(l-e f{u) ' s) )dt 



^E Q 



X(u,s)(l-e f ^ s) )di 



The supremum is taken over a linear functional of Q, which is continuous in Q, 
therefore the supremum over these linear functionals will be lower semicontinuous. 
Similarly, since in the variational formula expression of H(Q) in Lemma |6l the 
supremum is taken over a linear functional of Q, H(Q) is convex in Q. □ 

Lemma 8. H(Q) is linear in Q. 

Proof. It is in general true that the process-level entropy function H(Q) is linear 
in Q. Following the arguments in Donsker and Varadhan |31j . there exists a subset 
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Q C Q which is J-" °° measurable and a T °° measurable map Q : ^o —> -M.e{^) 
such that Q(fi ) = 1 for all Q G .M s (n) and Q(u : Q = Q) = 1 for all Q G M E {ty. 
Therefore, there exists a universal version, say Q w independent of Q such that 
J Q u Q(dcj~) = Q. Since that is true for all Q G A4e(Q), it also holds for 
Q G Ms(n). Hence, 



(3.46) H{Q) = E Q \h{Qf , P" )^>1 = E Q [/i(<T~ , P w ) ^ 



i.e. H(Q) is linear in Q. □ 

In this chapter, we are proving the large deviation principle for Hawkes pro- 
cesses started with empty history, i.e. with probability measure P . But when 
time elapses, the Hawkes process generates points and that create a new history. 
We need to understand how the history created affects the future. What we want 
to prove is some uniform estimates to the effect that if the past history is well 
controlled, then the new history will also be well controlled. This is essentially 
what the following Lemma [9] says. Consider the configuration of points starting 
from time up to time t. We shift it by t and denote that by w t such that w t G Q~ , 
where f2~ is Q restricted to IR~. These notations will be used in Lemma [9j 

Remark 2. At the very beginning of the chapter, we defined u>t- It should not be 
confused with w t in this section. 

Lemma 9. For any Q G .Me(^) such that H(Q) < oo and any open neighborhood 
N of Q, there exists some Kj such that G K~[ and Q(Kj~) — > 1 as £ — >■ oo and 



(3.47) liminf- inf \ogP w °(R tul G N,w t G Kj) > -H{Q). 

t^oo t w €K~ 
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Proof. Let us abuse the notations a bit by denning 



(3.48) 



a(ut) = a[ Yl h ^ s - T ) 

rGcj _ ,rGcj[0,s) 



For any t > 0, since A(-) > c > and A(-) is Lipschitz with constant a, we have 



(3.49) 
log 



dP u 



dP Wa 



t rt 

\(w ) — X(u~)ds + / lo; 

P; JO 

</ , |AK)-A( W -)|* + riog(l+ |AW - A(W " )l 



o \Kwo)J 



\(w ) 



dN Q 



rt ft 

< a Y] h ( s ~ r ) ds + / - yZ h ( s ~ r)dN s . 
Jo , _ , ./o c , _, , 



t£cj Utuo 



r£w Utt>o 



Define 



(3.50) 



if- = {u : N[-t,0](u) <e(l + t),\ft>0}. 



By the maximal ergodic theorem, 



(3.51) 



(3.52) 



\t>o t + 1 

/ N[-n,0] 
= Q [ sup — > 



< 



i>l,nSN Tl 

E Q [iV[0,l]] , Q 



as £ — > oo. Thus Q(K e ) — >■ 1 as £ — > oo. 

Fix any s > and w - G -£C~. Since h is decreasing, ft/ < 0, integration by parts 
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shows that 



t£u" 

poo 





oo 



poo 

(3.53) V h(s - r) = / ft(s + a)diV[-a, 0] 

N[-a,0}h'(s + a)da 

POD 

<- £(l + a)ti(s + a)da 
Jo 

POD 

£h(s)+£ h(s + a)da 
Jo 

£h(s)+£H(s), 



where H(t) = f°° h(s)ds. 

Therefore, uniformly for w_ , Wq G K t 



e ' 



(3.54) / a V h(s - r)ds < 2£a\\h\\ L i + 2£au(t), 

JO <r -I , 



where -u(t) = J H(s)ds and 



(3.55) / - V h(s- r)dN s <— [ (h(s) + H(s))dN s 

Jo c r _. . c </o 



Define 



(3.56) K+ t = \ u : ^ / (/i(s) + if (s))dJV a < £ 2 (|N|li + «(*)) 



Then, uniformly in £ > 0, 



(3.57) Q((*+» < ^« -, 0, 
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as £ — > oo. Thus inf t>0 Q(K^ t ) — > 1 as £ — > oo. 

Hence, uniformly for w_,u>o £ iQ~ and a; G K^ t , 



(3.58) 
(3.59) 



log 



dP<^ 



dP w ° 



< 2£a\\h\\ L i + 2&m(t) + £ 2 (\\h\\ L i + u(i)) 



.f° 



C 1 (e) + C 2 (£)u(t), 



where d(£) = 2&*||/i|| L i + £ 2 \\h\\ L i and C 2 (^) = 2£a + £ 2 . 
Observe that 



(3.60) 



u(t) 



i r 

limsup — ^ = limsup - / H(s)ds = 0. 



Let D t = {R tjU1 EN,w t EK-}. 
Uniformly for Wq G Kj t , 

(3.61) 
P W0 (D t ) 

> e -t(H(Q)+e)-Ci(<)-C2(/)u(t) 



g 



> e 



. 1 , dP u 

A n ^ - log 



t °dQ" 



< if (Q) + 6 n ioi 



dP^ 



j? 



dP Wo 



*? 



< C x {£) + C 2 {£)u{t) 



-t(H(Q)+e)-C 1 {e)-C 2 (e)u(t) 



Q 



An - log 



t rfg^ 






<if(g) + e^n{K+n^-} 



Since Q G A4b(J1), by ergodic theorem, 



(3.62) 



lim Q(R t:iV G N) = 1, 

t— »oo 
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and since ip(u,t) = log jp^\- F o satisfies, 



(3.63) 



il)(u,t + s) = ip(w,t)+il>(e 1 fjj,s), E Q [iP(iu,t)} =tH(Q), 



for almost every u under Q, 



(3.64) 



1 dP^ 
fim - log — — 



tf(Q). 



r? 



Q is stationary, so Q{w t G fQ~) > Q(^i) — > 1 as £ — )• 00. Also, Q(K^ t ) > 
inf t> o Q(Kf t ) — >■ 1 as £ — >■ 00. 
big enough, we conclude that 



inf t> o Q(K* t ) ->■ 1 as f -)• 00. Remember that limsup^^ ^ = 0. By choosing 



(3.65) liminf - inf log P wo (i?i w eiV.^e fsT7) > -H{Q) - e. 



Since it holds for any e > 0, we get the desired result. 



Theorem 12 (Lower Bound). For any open set G, 



□ 



(3.66) 



liminf- log P(Rt u eG)>- inf H(Q). 
t^co t QeG 



Proof. It is sufficent to prove that for any Q G Ats(^), H(Q) < 00, for any 
neighborhood iV of Q, lim inf ^^ | log P{R t ^ <E N) > —H(Q). Since for every 
invariant measure P G A^5, there exists a probability measure /xp on the space M.e 
of ergodic measures such that P = J' Qnp(dQ), for any Q G Ais{&) such that 
H(Q) < 00, without loss of generality, we can assume that Q = Ylj=i a jQj-> where 
otj > 0, 1 < j < £ and Y? j=1 «j = 1. By linearity of if (•), if (Q) = £* =1 <XjH(Qj). 
Divide the interval [0,t] into subintervals of length otjt, let tj, 1 < j < £ be the 



right hand endpoints of these subintervals, and let t = 0. For each Qj, take K M as 
in Lemma M We have mini^j^Qj^^) — > 1, as M — > oo. Choose neighborhoods 
Nj of Qj, 1 < j < £ such that [f j=1 ocjNj C JV. We have 

(3.67) P {R t ^ eN)> P (R tl ,^ e N h w tl e K M ) 

e, 
■ \\ inf P^{R tj . t] _^ e N 3) w tj . t ^ e K M ). 

3=2 t j _ 1 -t j _ 2 

Now, applying Lemma M and the linearity of H(-), 

1 - 

(3.68) lim inf- log P (R t , ul E N) > - V aj H (Qj) = -H(Q). 

t— >oo t *—^ 

j=l 

D 

3.3 Upper Bound 

Remark 3. By following the argument in Donsker and Varadhan JSTj - if u~ i-> 
P w is weakly continuous, then 



(3.69) lim sup - log P(i? tjW 6 A) < - inf #(Q), 



/or any compact A. If the Hawkes process has finite range of memory, i.e. h(-) 
has compact support, and if it is continuous, then, for any a <b, if u~ — » u~ , we 
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have 



(3.70) 



b pb 

\(u~,u,s)ds— / \(u~ ,u,s)ds 



< a 






ds — > oo, 



as n — >■ oo ; which implies that P Wn — >■ P u . 

If the Hawkes process does not have finite range of memory, then we should use 
the specific features of the Hawkes process to obtain the upper bound. 

Before we proceed, let us prove an easy but very useful lemma that we will use 
repeatedly in the proofs of the estimates in this chapter. 

Lemma 10. Let f(u, s) be J-7 00 progressively measurable and predictable. Then, 



(3.71) 



E 



J* f(u,s)dN s 



< E 



J V /(w ' s) -l)A(",s)<fe 



1/2 



Proof Since exp < J Q 2f(u, s)dN s — J (e 2 ^ w ' s ) — l)A(w, s)ds > is a martingale, by 
Cauchy-Schwarz inequality, 



(3.72) 



E 



J* f(u>,s)dN s 



E 



e y*2f(w,s)dN 3 -y*(e 2 f(^)-l)X {u , s)ds +y* {e 2f(^)- 1 )x {uJtS ) ds 

1/2 



< E e /o(e 2/(w ' 3) -l)A(o;, S )d S 



D 
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Define Ct 



(3.73) C T = |f(w) := f f(w,s)dN e - I (e /(w ' s) - l)X(u,s)ds, 

f(u, s) G B(^) n C(fi x 



Here \(u),s) is J-7 00 progressively measurable and predictable, and f(u,s) G 
B(J^) fl C(Q x IR) means that / is J^ progressively measurable, predictable and 
also bounded and continuous. 

Lemma 11. For any T > and F G Ct, we have, for any t > 0, 



(3.74) 



E 1 



(=T 



1 /„' F(0 s u;)d* 



< 1. 



Proo/. For any t > 0, writing ip(s) = Y.k-.s+kT<t F ^^+kT^), 



(3.75) 



E 1 






/J F(e S Lj)ds 



E 1 






Jo V>00cZs 



< ^ V [e^] cfe = 1, 



by Jensen's inequality and the fact that E p [e^ 5 -*] = 1 by iteratively conditioning 
since E pbJ [e F( ^] = 1 for any to~ . □ 

Remark 4. Under P , the J 7 , -00 progressively measurable rate function A zs we// 
defined since it only creates a history between time and time t. Similary, in 
the proof in Lemma \ll\ E p " [V^- 1 ] = 1 for any u~ should be interpreted as 
the expectation is 1 given any history created between time and t, which is well 
defined. 

Next, we need to compare ^ L F(9 s u t )ds and h L F(9 s u)ds. 
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Lemma 12. For any q > 0, T > and F G Ct, 



(3.76) lim sup - log E 7 

t— >oo t 



exp < q 



I /•* 1 

F{9 s u t )ds - 



T 



T 



F{9 s uj)ds 



Proof. 



(3.77) 



T 



< 



' ' F(6 s u t )ds -)-\ F(9 s u)ds 



T 



t pT 



o Jo 



f(u,9 s u)dN u ds 



1 
T 



t pT 



^0 



f(u, 6 s uj t )dN u ds 



t pT 



JO 



>A uM -l)\(9 s uj,u)duds 



1 
T 



t pT 



JO 



( e /K*.<*)_l) < \(0 a a; t , u )d w <fe 



It is easy to see that J Q f(u, 9 s u)dN u ds is J r s s +T -measurable and 



(3.78) 



/(u, 9 s u)dN u ds = / /(u, 9 s u t )dN u ds 



for any < s < t — T. Hence, 



(3.79) 



1 ft rT 1 pt pT 

- / / f(u,9 s u)dN u ds-- 

J- -In .In 1 .In Jn 



JO 

t pT 



f(u, 9 s uj t )dN u ds 



< 



T 



\f(u,9 s u)\dN u ds + 



< 



< 



t-T JO 
L 



T 



t-T JO 



\f(u,9 s u t )\dN u ds 



T 



T 



T 



[ N[s,s + T](u)ds + ^— [ N[s,s + T}(u t )ds 

Jt-T i Jt-T 

[N[t -T,t + T](u) + N[t-T,t + T](cu t )] 



[N[t -T,t + T] (u) + N[t - T, T] (u) + N[0, T] (u)] . 
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By Holder's inequality and Lemma 16, we have 



(3.80) limsup-logE p0 

t— >oo t 

< lim sup - log E, 1 

t-too t 



p\T So So f(ufisu)dN u ds-± f* f Q f(u,8 s u)t)dN u ds\ 



e T 



-[N[t-T,t+T\{u)+N[t-T,T\{u)+N[0,T\{u)\ 



Furthermore, 



(3.81) 

1 * <- T 
T 

1 

T 

1 



(I jo 
t rT 



f(u,e 3 u) _ 



l)\(9 s u!,u)duds 



T 



t t-T 



JO 



>K u ' 9aUlt) - l)\(9 s u t ,u)duds 



< 



J{u,B a u) _ f(u,9 s Lj t ) I 



X(9 s u,u)duds 



+ 



T 



o Jo 

t r T 



JO 



(g/M.**) _ X ) |a(^,m) - A(0 s a;,u)|duds. 
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For the first term 



(3.82) 



t r-T 



1 

T jo jo 
1 
f 



j(u,e a ui) _ f(u,e a uj t ) I 



/ r T 



\(9 s u,u)duds 



t-T Jo 



\ e f(u,e s u) _ e f(u,e M )\ \(0 sU j,u)duds 



< 



2e ll/ll 



t r T 



T J t-T JO 

2pII/iu°° /■* r T 

t/ / 

1 J t-T Jo 



X(6 s u,u)duds 



A I 2_. h(u + s — t) duds 

t£u)[0,u+s) 



<2e^TC €+ 2 -^-e f f V h( 

1 J t-T JO ,_ r n , * 



■u + s — r)duds 



t£u[0,u+s) 



< 2e mLOO TC e + 



2e ll/H. 



r 



t /-T 



iV[0,w + s]/i(0)ciwd,s 



t-T JO 
t 



< 2e ll/!li-TC , e + 2e ll/lli00 e / JV[0, s + T]h(0)ds 

Jt-T 

< 2e mL °°TC € + 2e il/lliOO eT(iV[0, t] + N[t, t + T})h(0). 



Therefore, 



(3.83) limsup-logE 7 

t— »oo t 



e T 



= fofo\e fiu,0sU ' ) -e f(u ' 9s ' Jt) \H8^,u)duds 



< c(e), 



where c(e) — > as e — > 0; in other words, it vanishes. 
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For the second term, 



(3.84) 



T 



< 



t f-T 



JO 



( e /(«,*.wt) _ x) \\[e a ut,u) - \{9 s uj,u)\ duds 



f 

t i-T 



(\ 



< 



+ 



+ 



'0 JO 

e \\f\\L°° _|_ 1 



y h(u + s — t) — y^ h(u + s — t) 

r£uJt{0,u+s)U(ut)- T£ui[0,u+s) 

t r T 



duds 





T 




e \\f\ 


_L°° 


+ 1 




T 




e \\f\ 


£°° 


+ 1 



/ / a Z_^ ^ u + s ~ r)duds 
Jt-T Jo ,_. . , n .. , s 



rSa; t [0,M+s) 



ct /-T 



/ / a Z-^i ^( U + s — r)duds 

Jt-T Jo _,_, ,r„ . . s 



r£w[0,'u+s) 



T 



a N, M M + s ~~ r)duds 
Jo Jo w w 



T&{uit) 



\(z) 



Assume that h(-) is decreasing and lim 2 _^ 00 -^ = 0. By applying Jensen's inequal 
ity twice, we can estimate the second term above, 



(3.85) 



W 



JI/IIl 00 +1 rt rT ru+s ,, , -. ,., , , 
55 — ^ a J t _TJo Jo h(u+s—v)dN v duds 






t-T JO 
t r T 



( e ll/lli,oo +1)a J-T |«+= h (u+s-v)dN v du ds 

( e ll/llioo + i) aT J«+ s h(u+s-v)dN v 

1/2 



duds 



e J u+s C(a,T,h)X(v)dv 



duds 



T Jt-T Jo 

<r / / top r,c.eC(«.r,ft)JV[o ) «+»]Mo)i V2 



T^ 



t-T JO 

e e 



E p "[e 



duds 



< e C(a,T,h)C^P reC(a,T,h)N[0,t+T]h(0)-\ 1/2 
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where C(a,T,h) = exp(2(ell / ^°° + l)aTh(0)) - 1. Thus 



(3.86) 



lim sup - log E 1 



e T 



— a J T f Jq s h(u+s—v)dN v duds 



Similarly, we can estimate the first term. 

For the third term, by Jensen's inequality, we have 



(3.87) 



E 1 



e IIIIlL° °+l ft r T 

e T 



1 



T 



^fL^- 



So So a E Te( ^)- h{u+s~r)duds 
e a(exp(||/|| L oo)+l)/ 'X; rS( . wt) _ h{u+s-r)ds 



du 



<E P 
= E 



*( cx P(ll/!l-L°°)+ 1 )/o £ TS ( wt )- h(s-r)ds 



I ■ „a(exp(||/ || L oo)+l) /„* f* J2T=o h(s+kt+t-u)dN u ds 



'(fc+l)t 



Since h(-) is decreasing, J* h(s)ds > th((k + l)t). Thus 



(3.* 



00 1 /"OO 

y^ h(s + /rf + 1 - u) < h(s + 1 - u) + - / 

fc=0 ^ Vs+t-u 



h{v)dv. 



Let C(a, /) = a(exp(||/|| L oo) + 1) and if (t) = / t °° /i(s)ds. Then, 



(3.89) E 1 



< E i 



,a(exp(||/|| L oo)+l) /„« /„* £~ „ h{ S +kt+t-u)dN u ds 



C(«,/) /„* So \H(s+t-u)dN u ds+C(aJ) f*[f* h(s+t-u)ds]dN u ~ 



E p [ j o '[CMl J o * H( S +t-«)d S ]dJV u +^[/J C(a,/)A(*+t-«)*>]«W,. 



Notice that 



(3.90) E J 



Jft 2 ^ ft H(s+t-v)d,]dN u 



<E i 



,[2i2JlfiH(s)ds]N t 
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where ^' J Q H(s)ds — > as t — > oo, which implies that 



(3.91) 



lim sup - log E 



Jo[ qi ¥ 1 Jo H(s+u)ds]dN u 



Moreover, 



(3.92) 



E 1 



Jo [So C(a,f)h(s+t-u)ds] dN u 



< E P0 \ e So t ( e2S(1 > C{a ' f)h(s+t ~ u)dS - l ) X ('u)du 



1/2 



< es 



iCe/ o *(e 2 /o c ( a '«' l ( s +*-") ds -l)du F P L/o(e^o 2C '( a ' / ) /t ( s+ *- u )' is -l)eE r < u ^(w-T)dw 



-.1/2 



< e 



iC e / '(e 2C ( a ^) H (*-")-l)du] E P' 2 



Jo( 



t, 2C(a,f)\\h\ 



Ll -^T. T<u Hu-r)du 



1/2 



< e §C e / o V C( "' /)W(u) -l)^ E P [ e 6(e 



2C(a,/)||h| 



^-mihlLlJVt 



1/2 



Notice that it holds for any e > and that \ JjJ( e 2Cf («>/) H («) - l)du -> as £ ->■ oo, 
which implies 



(3.93) 



lim sup - log E 

t— >oo C 



Jo [/o C(a,/)ft( S +t- U )d S ]dJV u 



Putting all these things together and applying Holder's inequality several times, 
we find that for any q > 0, T > and F e Ct, 



(3.94) lim sup- log E ; 

t— >oo £ 



exp < g 



^ y F(e s u t )ds - i y F(^ sW )rf S 



0. 



n 
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Lemma 13. 



(3.95) 



lim )- sup / F(u)Q(dw) > H(Q). 



Proof. Assume H{Q) < oo. For any e > 0, there exists some f e such that 



(3.96) 



E Q 



f f e dN s - f (e /E - l)\ds > H(Q) - e. 

Jo Jo J 



We can find a sequence f T G B ( T s (T X ' J n C(Q x R) -)• / e as T ->• oo. By Fatou's 
lemma, 



(3.97) liminf-sup [ F{u)Q(dcu) 

t^oo T Fe c T Jq 



> liminf E L 

T->oo 



1 ,1 

f T dN s - / (e /T - l)\ds 
(i Jo 



>H{Q)-e. 



If -f^(Q) = oo, then, for any M > 0, there exists some Jm such that 



(3.98) 



E Q 



[ f M dN s - [ (e^ - l)Xds 
Jo Jo 



> M. 



Repeat the same argument as in the case that H{Q) < oo. 



□ 



Lemma 14. For any compact set A, 



(3.99) 



lim sup - log P(i2t )W eA)<- inf #(Q). 

t^oo t Q&A 



Proof. Notice that 
(3.100) 

E P® [ e *[0,t]] < E ^ 



3 (e 2 -l)/ ( *A( S )d S " 



1/2 



< E i 



3 (e 2 -l)eh(0)Af[0,t]+C £ (e 2 -l) 



1/2 



By choosing e > small enough, we have E p [e^^ '^] < e ct for some constant 
C > 0. Therefore 



(3.101) 



lim sup lim sup - log P (JV[0, t] > it) = -oo, 



which implies (by comparing f n N[0, l}dR t ^ and ]V[0, £]/£ and the superexponential 



estimates in Lemma 16 ) 



(3.102) lim sup lim sup - log P ( / N[0, l]dRt, u > 



-oo. 



Therefore, we need only to consider compact sets A such that for any Q G A, 
E Q [N[0, 1]] <oo. 

Now for any A compact consisting of Q with E**[iV[0, 1]] < oo and for any 
F G Ct and for any p, q > 1, - + - = 1, by Holder's inequality, Chebychev's 



inequality, and Lemma 1 1 



(3.103) P (R t , UJ G A) 

<E P 



ei^/d *■(».<*)* 



< E i 



i. r* 



/ * F(0^)ds 



iVp 



• exp <^ — - inf / F(u)Q(du) 

pT QeA J n 

,$,\J* F(e.ut)ds-ft F(0.u)da 



E p 
■ exp 



V« 



f 



pT QeA 



inf [ F{u)Q{du>) 



< E' 






|;;:n,., .^-.l?^.«Hl 1/ff . e3ip /_± inf f F (u)Q(du) 

{ pTQGAj n 



By Lemma 12 



(3.104) lim sup \ log P {Rt^ e A) < -- inf i / F{co)Q{dco). 
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Since it holds for any p > 1, we get 

(3.105) limsup - \ogP (R t:U1 E A) < - inf ^ / F(u)Q(du). 



For any compact A, given Q G A and e > 0, by Lemma 13, there exists Tq > 
and Fq E Ct q such that ^h J n FQ(u)Q(du) > inf^ e g H(Q) — |e. Since the linear 
integral is a continuous functional of Q (see the proof of Lemma [7]), there exists 
a neighborhood Gq of Q such that Jh f n FQ(u))Q(du)) > inf^ e Q H(Q) — e for all 
Q G Gq. Since A is compact, there exists Gq 15 . . . , Gq £ such that A C lL=i ^Qj- 
Hence 



(3.106) inf sup sup inf \ [ F{u)Q{du) > inf H(Q) - e. 



Note that for any A and P, 



(3.107) limsup - log P(R t ^ E A U 5) 



< max < limsup - log P{R t ^ G A), limsup - log P{R ttU] E B)> . 

{ t— >OQ * t— >OD t J 

Thus, for Ad\J i=x Gj, 

1 If 

(3.108) limsup- log P(P tAJ G A) < - inf sup sup inf — F(u)Q(du), 

t^oo t l<j<£ T>0 FgC t Q eG i T J 

whence limsup^^ | log P(Rt^ E A) < — inf q £ a H{Q) for any compact A. □ 

Theorem 13 (Upper Bound). For any closed set C , 



(3.109) limsup - logP(P t)£J G C) < - inf #(Q). 

t— >oo t QaC 
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Proof. For any closed set C and compact A n which is defined in Lemma 21 we 
have 



(3.110) 



limsup - log P(R t ,cj G C) 

t— >oo t 

< max <^ lim sup - log P{R t , {JJ eCn A n ), lim sup - log P(R t , u 6 (A n ) 



Since C f~l A n is compact, Lemma 14 implies 



(3.111) limsup-logP(i2 tl4 ,eCn.A n ) < - inf H(Q) < - inf H(Q). 

t^oo t QeCrU™ QeC 



Furthermore, by Lemma 20 



(3.112) limsup - \ogP(R t<u] G (A n ) 

t— »00 £ 



= lim sup - log P [ R ttU e M -41 . . J 

\ J=n / 

< max max <^ limsup - logP ( - / XN[o,i]>j(O s Ut)ds > e(j) ) , 

J> n { t^oo t \t Jo J 

limsup - logP ( - / XN[o,i/j]>2(6 s u t )ds > (l/j)g(l/j) J , 
limsup -logP ( - / iV[0, i\xN[o,i]>e(9 s u t )ds > m{l) J \ ->- -oo 



as n — ?• oo. Hence, 



(3.113) limsup - log P(Rt tW eC)<- inf H{Q). 

t—too t Q&C 



□ 



3.4 Superexponential Estimates 

In order to get the full large deviation principle, we need the upper bound 
inequality valid for any closed set instead of for any compact set, which requires 
some superexponential estimates. 

Lemma 15. For any q > 0, 



(3.114) 



lim sup - log E 

£->oo * 



,q f* h(t-s)dN s 



Proof. 

(3.115) 



E 1 



qf*h(t-s)dN 3 



< w 



J*(e 2 "^t-s)_ 1)x{Eo<T<sh{s _ T))ds 



1/2 



< 



e p [ e (c e +h(o) e jv«)./'oV* h( *-* ) -i)<fa 



1/2 



Note that / *(e 29M *~ a) - l)ds = J*(e 2qh ^ - l)ds G L ± since h e L 1 . Therefore, 



(3.116) 



lim sup - log E 

t— >oo t 



e qf*h(t-s)dN s 



< c(e), 



where c(e) — > as e — > 0. Since it holds for any e, we get the desired result. □ 



Lemma 16. For any q > and T > 0, 



(3.117) 



lim sup- log E^^^]] =0. 



Therefore, for any e > 0, 



(3.118) 



lim sup - log P (N[t, t + T]>et) = -oo. 

t— >OQ V 
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Proof. By Holder's inequality, 



(3.119) 

E p0 [e^'+^j < E p 



e (e 2 "-l)f t t+T X(j: 0<r<s h{s-r ))ds 

< e^ e2q -^ T ■ E p 



1/2 



e e(e 2 i-l)h(0)N[t,t+T}+e(e 2 i-l) f* h(t-s)dN a 
1/4 



1/2 



D 2e(e 2 i-l)h(0)N[t,t+T] 



. " 2e(e a «-l)/*h(*-<0«W. 



E p * 

1/4 



Choose e < g[2(e 2g - l)/i(0)]-\ Then 



(3.120) E p " [ 



e qN[t,t+TU 3/4 < e i (e 2 9 _i )CeT _ E p^ 



3 2e(e 2 9-l)/ t /i(t-s)d7V s 



1/4 



Lemma 15 completes the proof. 



□ 



Lemma 17. VKe /jat>e i/ie following superexponential estimates, 
(i) For any e > 0, 



(3.121) 



1 (1 rt 

lim sup lim sup - log P I — 

5->0 t-»oo ^ \0£ 



XiY[o,5]>2(6 ) s w)(is > e 



— 00. 



(ii) For any e > 0, 



(3.122) 



lim sup lim sup - log P ( - / XN[o,i]>M(9 s u)ds > e I = - x 



A/— >oo t— >QO 



f 



fmj For any e > 0, 



(3.123) lim sup lim sup - log P ( - j N[0, 1]xn[o,i]>£ (O s u)ds > e ) >c. 



S3 



Proof, (i) Define 



(3.124) N e ,[0,t] = f Xx(s)<i'dN s , N e ,[0,t] = f Xx(s)>i>dN s . 

Jo Jo 



Then ]V[0, t] = N(i[0,t] + N^[0,t] and A^/[0,t] has compensator f \(s)x\( s )<e'ds 
and Ni'[0,t] has compensator f Q \(s)xx(s)>i'ds. Notice that 



(3.125) 



XN[0,S]>2 < XN e ,[0,5}>2 + XN tl [o,S\>V 



It is clear that N? is dominated by the usual Poisson process with rate £'. By 



Lemma 18 



(3.126) lim sup lim sup - log P 



8^0 *->oo 



t 



St 



XN e ,[0,8]>2(O s Uj)ds > 



-oo. 



On the other hand, 



( 3 - 127 ) g J XN A o, 5] >i(0su)ds = -J XN e ,{s,s+8]>l(u)ds 

1 /■* * 

< - / N fJ [s,s + 5]ds 



S 



s 



t+8 



Ne[Q,s]ds- -z / N e [0,s]ds 



<N e [0,t] + N[t,t + 5}. 



By Lemma 16, we have 



(3.128) 



lim sup -logP\-N[t,t + S\ > - 



— oo, 
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for any 5 > 0. Hence 



(3.129) 



lim sup lim sup - log PI - N[t, t + 5] > - 

6^0 t^oo t \t 4 



-00. 



Finally, for some positive h(£') to be chosen later, 



(3.130) P[]N e [Q,t]>{) < 3 



Mf)N t ,[Q,t] 



7 -th(i')e/4 



< 



1/2 



E r e (e 2 ^')-1)/ 'A( S ) XA(3) >^1 ' e -th(f)e/4 



Let f(z) = -fa. Then f(z) ->■ oo as z ->■ oo. Let Z s = ^ rew[0s] h(s-r). Then, by 
the definition of A(s) and abusing the notation a little bit, we see that A(s) = A(Z S ). 
Since A(-) is increasing, its inverse function A -1 exists and A _1 (T) — > oo as £' — > oo. 
We have 



(3.131) E 



,(e 2h ^-l)ti\(s) Xx(s) > e ,ds 



1/2 



< E 

< E 



^')-i)ti\{z s ) Xzs > x - 1(el) ds 



1/2 



^ ( °-^Wh# 



«) 



- ds 



1/2 



It is clear that lim^-^ inf z >^ /(A 1 (-z)) = oo. Choose 



(3.132) 



/i(^) = - log 



inf/(A- 1 (^)) + l 

z>£' 



Then h{£') — > oo as £' — > oo and 



(3.133) E 






1/2 



E 
E 



e /o Zsds 



1/2 



;/dE T6l ,[0, S ]' l M* 
IIMIrlWtl 1 / 2 



1/2 



< E [ e [|ft[| £ l^] 



Hence, 



(3.134) 



lim sup lim sup lim sup - log P I -iV^[0,£] > - 



-oo. 



(ii) It is easy to see that (iii) implies (ii). 
(iii) Observe first that 



(3.135) 

N[s, s + 1]xn[ s , s +i]>£ < Ni,{s,s + l]x Nel[StS+1] >i+Ne>{s,s + l}xN el[ s,s+i]>i 

£ £ 

For the first term, notice that N? is dominated by a usual Poisson process with 



rate £'. Thus, by Lemma 19 
(3.136) 



l— >oo t— >oo 



t 



lim sup lim sup 7 log P ( - / N t >[s, s + l)x Nt ,[ SiS+1 }>t(u)ds > 



ii 



— oo. 



For the second term, Ng>[s, s + 1]Xn ,\ s s+ii>^ — ^'[ s ' s + 1] and 



(3.137) 



N t >[s,s + l]ds <N t >[0,t]+N[t,t + l]. 
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By Lemma [16 



(3.138) lim sup - log P ( -N[t, t + 1] > \ ] - x . 



and by the same argument as in (i), 

(3.139) lim sup lim sup lim sup - log P I -TV^fO, £] > - I = — oo. 

(.'-too £->oo t-¥oo t \t 8/ 

For the third term, notice that 

(3.140) / 7,XN e ,is,s+i]>§ ds < / ^'[* J * + 1 ]Xjv £ ,[ a>a+ i]>|(w)ds- 

So we can get the same superexponential estimate as before. Finally, for the fourth 
term, 

(3.141) J lxN il[s , s+l] >i 2 ds < J Nt,[s,s + l](u)ds. 

We can get the same superexponential estimate as before. □ 

Lemma 18. Assume N t is a Poisson process with constant rate A. Then for any 
e> 0, 

(3.142) lim sup lim sup -log P ( — / XN[s, s +5]>2(u)ds > e 1 = -oo. 

5^0 t^oo t \0t J J 

Proof. Let f(6,u) = -f^h)XN[o,S]>2(^), where h(S) is to be chosen later. By Jensen's 



inequality and stationarity and independence of increments of the Poisson process, 



(3.143) E 



e /o \f(.6,e.u>)da' 


< E 


' e MSz$l$Me 


s+j5^)^S 




< E 


I [ s e ^l s ] me s+JS ^ ds 

.° Jo 


= E 


"eE^o 1 /(^")" 





= E[e^)] [t/51+1 

= { e VM*)(l_ e -A«_ A(Je -A5 ) 

<(M' e W)A 2 5 2 + l)^ +1 , 



-\s 



\5e~ xs } 



A51 [*/^]+l 



for some M' > 0. Choose h(5) = lo , 1 1 ,^ . Then 



(3.144) 



E 



e / * \f{6fi.u)ta 



<(M'5 + l) m+1 <e Mt , 



for some M > 0. Therefore, by Chebychev's inequality, 



(3 - 145) n ™i up i logP {sm I ^^>-^ ds ^W))- M ~W) 



which holds for any 5 > 0. Letting 5 — > 0, we get the desired result. 



□ 



Lemma 19. Assume N t is a Poisson process with constant rate A. Then for any 
e> 0, 



1 (\ /•* 

(3.146) lim sup lim sup - log P I - / JV[0, l]xN[o,i]>e(Qsw)ds > e j = og. 

£->-oo <->oo C \C Jn 



Proof. Let /i(£) be some function of £ to be chosen later. Following the same 



argument as in the proof of Lemma 18, we have 



(3.147) 



P ( h(£) I N[Q, l] X N[a,i]>t(O s u)ds > eh(£)t 



< E 



3 ^)/o^[0,l]Xiv[o,i]><( e ^) ds 



-eh(£)t 



< 



E [ e h(£)NlO,l] XN[OA] > e l W+ 1 e -eh(i)t 

{oo 
P(iV[0,l] <£) + J2 eh(i)ke ~ X 
k=e 






W+i 



,-eh(£)t 



[*]+l 



< 



1+C1E 



,/i(€)fe+log(A)fe-log(fe)fe 



-eh(£)t 



k=£ 



< \l + (j 2e h(£)£+iog(.\)t-\og(i)e^ W+ 1 e -e/iWt_ 



Choosing /i(£) = (log(£)) 1 / 2 will do the work. 



□ 



The following Lemma 20 provides us the superexponential estimates that we 



need. These superexponential estimates have basically been done in Lemma 17 



The difference is that in the statement in Lemma 17, we used u and in Lemma 20 



it is changed to u>t which is what we needed. Lemma 20 has three statements. Part 
(i) says if you start with a sequence of simple point processes, the limiting point 
process may not be simple, but this has probability that is superexponentially 
small. Part (ii) is the usual superexponential we would expect if Ais{ty were 
equipped with weak topology. But since we are using a strengthened weak topology 
with the convergence of first moment as well, we will also need Part (iii). 

Lemma 20. We have the following superexponential estimates, 
(i) For some g(S) — > as 5 — > 0, 



(3.148) lim sup lim sup - log P I — 

5->0 t^oo t \0t 



XN[o,5}>2(0 s u t )ds > g(5) 



-oo. 
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(ii) For some e(M) — y as M —y oo, 



(3.149) lim sup lim sup -logP ( - / XN{o,i]>M(9 s Ut)ds > e(M) 

M-»oo t— >oo t \t Jo 



-oo. 



(in) For some m (£) —y as £ —y oo, 



1 (\ /"' 

(3.150) lim sup lim sup - log P I - / ]V[0, 1\Xn[q,i] 

£^oo t-s>oo t \ t J q 



> £ (9 s uj t )ds > m(£) 



-oo. 



Proof. We can replace the e in the statement of Lemma 17 by g(5), e(M) and m(£) 



by a standard analysis argument. Here, we can also replace the u in Lemma 17 
by u t since 



(3.151) 



t pt 

XN[O,8]>2(0 s U t )ds - / XN[o,8]>2(0 s u)ds 
./0 



<25, 



(3.152) 



t pt 

XN[o,i]>M{9 s u t )ds — / Xn[o,i]>m(Qsw)(Is 



<2, 



and 



(3.153) 



iV[0, l]xN[o,i]>e(d s u t )ds - / iV[0, l]XjV[o,i]>^(0 s w)efe 



< / N[s,s + l](u)ds+ / JV[s,s + l](w t )ds 

<N[t-l,t + l](w) + iV[£ - l,t + l]( Wt ) 

= JV[t - 1, t + l](w) + N[t - l,t)(u) + N[0, l](w). 
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By Lemma 16, we have the superexponential estimate, for any e > 0, 

(3.154) 

hmswp-\ogP[-{N[t-l,t+l]{u) + N[t-l,t]{u) + N[0,l]{u)} > e J = -oo. 

D 
Lemma 21. For any 5, M > 0,£ > 0, define 

(3.155) A 5 = {Qe M s (n) : Q{N[0, 5]>2)< 5g(5)} , 

Am = {Q e M s (n) : Q(N[0, 1] > M) < e(M)} , 
Ai= \q E M s (ty : [ N[0, l]dQ < m(£) 

I JN[0,1]>£ 

where e(M) -» as M ->• oo ; m(f) ->• as £ ->■ oo and g{8) ->• as 5 — >• 0. Le£ 
-4<5 m i = Asf] Am H ^ ana 7 



(3.156) ^ n =p|^ 






Taen *4. n zs compact. 

Proof. Observe that for /3 > 0, the sets 



(3.157) Kp=f){u: {N{-k, -{k - l)](w) < 04} n {iV[A; - 1, fc](w) < /%}} 



fe=i 



are relatively compact in Q. Let Kp be the closure of Kp, which is then compact. 

For any Q e A n , Q(N[0, 1] > M) < e(M) for any M > n. We can choose /3 big 

enough and an increasing sequence 4 such that /?4 > n and oo > YlkLi e (/%0 -^ 
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as (3 — > oo, uniformly for Q e A n , 

(3.158) 

Q (i^ c ) < Q(i^) 



= Q \j{N[-k, -(k - l)](w) > /34} n {iV[A; - 1, fe]( w ) > £4} 
\fc=l 

oo 

< Y, {Q(N[-(k - 1), -k] > (3£) + Q(N[k -l,k}> (3£ k )} 



k=l 

oo 

= 2^Q(JV[0,1]> 

fe=i 


w> 


<2^e(/34)^0 





fc=l 

as /3 — > oo. Therefore, .4.™ is tight in the weak topology and by Prokhorov theorem 
A n is precompact in the weak topology. In other words, for any sequence in A n , 
there exists a subsequence, say Q n such that Q n — > Q weakly as n — > oo for 
some Q. By the definition of A n , Q n are uniformly integrable, which implies that 
J N[0, l]dQ n — ¥ J N[0, l]dQ as n — > oo. It is also easy to see that A n is closed by 
checking that each A\ ,- ,- is closed. That implies that Q G A n . Finally, we need to 

3 

check that Q is a simple point process. Let Ijj = [{j — 1)5, jS]. We have for any 
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QeA n , 

(3.159) Q (3£ : N[t-, t] > 2) = Q ( (J {3t e [-jfe, fc] : JV[t- t] > 2} 



vfc=l 

oo [fc/5] 



= Ufl U {^ : #{^ U /,, 4 > 2} 

\k=l <5>0 j=-[k/S\ + l 
oo [fe/<5] 

<^_inf £ Q(#{u;Ul j ,s}>2) 



fc=l m' - J= -[fc/(5]+l 



<£ inf {2[fc/<S]<W)} 

, , 5= — ,m>n 
fc=l m' - 

= 0. 
Hence, v4. n is precompact in our topology. Since A n is closed, it is compact. □ 

3.5 Concluding Remarks 

In this chapter, we obtained a process-level large deviation principle for a wide 
class of simple point processes, i.e. nonlinear Hawkes processes. Indeed, the meth- 
ods and ideas should apply to other simple point processes as well and we should 
expect to get the same expression for the rate function H(Q). For H(Q) < oo, it 
should be of the form 



(3.160) H(Q)= I I X(u, s) - X(lo, s) + log | ^' ; I X(u, s)dsQ(dio) 




n Jo 



X(u,s) 



where A (a;, s) is the intensity of the underlying simple point process. Now, it would 
be interesting to ask for what conditions for a simple point process would guarantee 
the process- level large deviation principle that we obtained in this chapter? First, 
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we have to assume that X(u, t) is predictable and progressively measurable. Second, 
in our proof of the upper bound in this chapter, the key assumption we used about 
nonlinear Hawkes process was that lim^oo -^ = 0. That is crucial to guarantee 
the superexponential estimates we needed for the upper bound. If for a simple 
point process, we have X(u,t) < F(N(t,u>)) for some sublinear function F(-), we 
would expect the superexponential estimates still to work for the upper bound. 
Third, it is not enough to have \(u,t) < F(N(t,u)) for sublinear F(-) to get 
the full large deviation principle. The reason is that in the proof of lower bound, 
in particular, in Lemma [9j we need to use the fact that any memory in X(ui,t) 
has memory will decay to zero over time. For nonlinear Hawkes processes, this is 
guaranteed by the assumption that L h(t)dt < oo, which is crucial in the proof 
of Lemma |9J Indeed for any simple point process P, if you want to define P w , the 
probability measure conditional on the past history u~ , to make sense of it, you 
have to have some regularities to ensure that the memory of the history will decay 
to zero eventually over time. From this perspective, nonlinear Hawkes processes 
form a rich and ideal class for which the process-level large deviation principle 
holds. 
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Chapter 4 

Large Deviations for Markovian 
Nonlinear Hawkes Processes 



In Chapter pi we studied the large deviations for (Nt/t E •) by proving first 
a process-level, i.e. level-3 large deviation principle and then applying the con- 
traction principle. In this chapter, we will obtain an alternative expression for the 
rate function of the large deviation principle of (Nt/t G •) when h(-) is exponential 
or sums of exponentials. The main idea is that when h(-) is exponential or sums 
of exponentials, the system is Markovian and we can use Feynman-Kac formula 
to obtain an upper bound and some tilting method to get a lower bound. The 
assumption lim^oo -^ = will provide us the compactness in order to apply a 
minmax theorem to match the lower bound and the upper bound. 
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4.1 An Ergodic Lemma 

Let us prove an ergodic theorem first. Assume h(t) = Yli=i < l i^~ bit - Here 
hi > 0, <2j ^ might be negative but we assume that h(t) > for any t > 0. 

Z t = J2 Tj <t h (t ~ Tj) = J2tiMt), where Z { (t) = Er^t^'^'^- The domain 
for {Zx(t), ..., Z d (t)) is Z := R ei x • • • R 6d , where R e * := R + or Mr depending on 
whether q = +1 or — 1, where e« = +1 if Oj > and e» = — 1 otherwise. 
The generator ^4 for (-Zi(t), . . . , Z d {t)) is given by 

(4.1) .4/ = - V^Miq— + \{zi,...,z d )[f{zi +a 1 ,...,z d + a d )- f(z 1 ,...,z d )]. 

i= 1 

We want to prove the existence and uniqueness of the invariant probability 
measure for (Zi(t), . . . , Z d (t)). Here the invariance is in time. 

The lecture notes [17] by Martin Hairer gives the criterion for the existence and 
uniqueness of the invariant probability measure for Markov processes. 

Suppose we have a jump diffusion process with generator C If we can find u 
such that u > 0, Cu < C\ — C^u for some constants Ci, C<i > 0, then, there exists 
an invariant probability measure. We thereby have the following lemma. 

Lemma 22. Consider h(t) = Y2i=i a i e ~ blt > 0- -^ e * = +1 if cu > Q and 
€i = — 1 if di < 0. Assume X(zi,...,z n ) < J2i=i a i\ z i\ + Pi where (3 > and 
ct« > 0, 1 < i < d, satisfies J2i=i ^ a i < 1- 27ien, i/iere exists a unique invariant 
probability measure for (Zi(t), . . . , Z d (t)). 
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Proof. Try u(zi, . . . ,Zd) = J2 i=1 ti^A > 0, where q > 0, 1 < i < d. Then, 

d d 

(4.2) Au = - ^2 bi^iCiZi + X(zi, ...,Zd)y~] ai€iCi 

i=i i=i 

d d d d 

< -^2hci\zi\ +y^Q!il^l ^2\ai\ci + (3^2\ai\ci. 



i=l i=l i=l 



Taking q = ^ > 0, we get 



d d I I 



(4.3) ^ < - i - y, ^F E «* w + E 



bi / ^— ' ^— ' 6j 

i=l / j=l i=l 

d i | \ d , , 



s-s&fc- '-ET M+ ^ 



hi ) ^ bi 

i=i / «=i 

Next, we will prove the uniqueness of the invariant probability measure. Con- 
sider the simplest case h(t) = ae~ bt . It is sufficient to prove that for any x, y > 0, 
there exists some T > such that V X (T,-) and V y (T,-) are not mutually sin- 
gular. Here V X (T,-) = F(Z^ e •), where Z£ is Z^ starting at Zo = x, i.e. 

Z£ = xe- bT + E Tj <T ae ~ b{T ~ T]) - 

Let us assume that x > y > 0. Conditioning on the event that Zf and Zf have 

exactly one jump during the time interval (0, T) respectively, the laws of V X (T, •) 

and V y (T, •) are absolutely continuous with respect to some probability measures 

with positive density on the sets 

(4.4) ((a + x)e- bT ,xe- bT + a) and ((a + y)e~ bT ,ye~ bT + a) 
respectively. Choosing T > | log( g ~^ +a ), we have 

(4.5) ((a + x)e- 6T , xe~ 6T + a) f] ((a + y) e - bT , ye' bT + a) ^ 0, 
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which implies that V X (T, •) and V y (T, ■) are not mutually singular. 

Similarly, we can prove the uniqueness of the invariant probability measure for 
the multidimensional case. We need to condition on the event that we have exactly 
d jumps during the time interval (0, T) for both Zf and Z%, where x, y G Z&. Then, 

(4.6) Z* = (%*,. ..,Z?)eZ d , 



where Z? = a^e -6 ** + J2 T <t a^~ hi{t ~ Ti) , 1 < i < d. Then, V X (T, •) and V y (T, 



arc 



not mutually singular for sufficiently large T. □ 

4.2 Large Deviations for Markovian Nonlinear 
Hawkes Processes with Exponential Exciting 
Function 

We assume first that h(t) = ae~ bt , where a,b > 0, i.e. the process Z t jumps 
upwards an amount a at each point and decays exponentially between points with 
rate b. In this case, Z t is Markovian. 

Notice first that Zq = and 

(4.7) dZ t = -bZ t dt + adN t , 

which implies that N t = \Z t + \ f Z s ds. 

We prove first the existence of the limit of the logarithmic moment generating 
function of N t . 

Theorem 14. Assume that lini^oo -^ = and that A(-) is continuous and 



bounded below by some positive constant. Then, 



(4.8) 



lim -logE[e e7Vt ] =T(0), 

t— >oo t 



where 



(4.9) V{9) = sup 
(A,*)eS 



0b 



zf(dz) + (X - \)*(dz) - / (log(A/A)) Xft{dz) 



where Q e is defined as 



(4.10) Q e = < (A, 7r) G Q : A has unique invariant probability measure 



IX 



where 



(4.11) Q = i (A, tt) : 7T G M(M + ), / ZTr(cfe) < oo, A G L^vr), A > \ , 

where M(M. + ) denotes the space of probability measures on M + and for any X such 
that (A,7r) G Q, we define the generator A as 



(4.12) 



Af(z) = -bz d J- + \{z)[f{z + a)- f(z)}. 



for any f : R + —$■ M. that is C , i.e. continuously differentiable. 



Proof. By Lemma 23 ^[e 61 ^] < oo for any &GM, also 



(4.13) 



E[e WVi ] = E 



,l(z t +bJ*Z s ds) 
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Define the set 



(4.14) U e = {u e C 1 (R + ,R + ) : u(z) = e /(z) , where /eJ}, 



where 



(4.15) T={f:f(z) = Kz + g(z) + L,K>-,K,Le 

a 



g is C\ with compact support >. 



Now for any u G Vfg, define 



(4.16) 



M 



sup 

z>0 



Au(z) + ^zu(z) 
u(z) 



By Dynkin's formula if M < oo, for V(z) := —z, we have 



(4.17) 



E 



u(Z t )e^nZs)ds 



u(Z )+ / E 



{Au{Z s ) + V{Z s )u{Z s ))e^nz,)dv 



ds 



< u{Z ) + M / E 
'o 



u 



■ Z U s V(Z v )dv 



ds, 



which implies by Gronwall's lemma that 



(4.18) 



E 



u{Z t )eti v{z ° )ds ] < u(Z )e Mt = u(0)e Mt . 



Observe that by the definition of U$, for any m G W«, we have u(z) > c\e°. z for 
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some constant c\ > and therefore by (4.13) and (4.18) 



(4.19) 



E \e m ] < -E 



u(Z t )efi 



-Z s ds 



< —u{0)e 



hit 



Hence, 



(4.20) 



1 r -„, Au(z) + -zu(z) 
limsup - logE \e BNt ] < M = sup ——^ — . 

t^oo t 2>0 uiz) 



which is still true even if M = oo. Since this holds for any u EUg, 



(4.21) 



1 r arJ1 Auiz) + -zuiz) 
limsup - logE \e BNt ] < inf sup " — 

t^oo t - u&Je z >o Uiz) 



Define the tilted probability measure P by 



(4.22) 






Tt 



exp { l\\(Z s ) - \{Z s ))ds + I log f ^i I diV. 



Notice that P defined in (4.22) is indeed a probability measure by Girsanov for- 



mula. (For the theory of absolute continuity for point processes and their Girsanov 
formulas, we refer to Lipster and Shiryaev 
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Now by Jensen's inequality, 



(4.23) liminf-logE[e 

t— >oo t 



lim inf - log E 

t— >OD t 



9Nf\ 



exp < 9N t — log 



dF 



Ti 



> lim inf E 

t— »OC 



lim inf E 

t— >oo 



-0iV t - - log — 

t t 6 dP 



Tt 



-6N t -\\ (A(Z S ) - A(Z s ))ds - 

C t In 



Mwi UN - 



Since iV t — f Q X(Z s )ds is a martingale under P, we have 



(4.24) 



E 



bg I WJ ' W " A(Zs)rfs) 



0. 



Therefore, by the ergodic theorem, (for a reference, see Chapter 16.4 of Koralov 
and Sinai [M]), for any (A, ft) G Q e , 



(4.25) liminf-logE[e ( 

t— >oo £ 



eiVti 



> lim inf E 

t— »oo 



^4j>^ 



\(Z.))da- j o log I ^.)A(Z.)(fa 



06 



Z7r(cfe)+ (\-X)7t{dz)- / (log(A)-log(A)) Att(^). 



Hence, 



(4.26) liminf-logE[e 



<W* 



> sup 

(A,*)eC 



06 



27T+ /(A-A)Tr- / log(A) - log(A) Att 
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Recall that 



(1.271 T {f:f(z) = Kz + g(z) + L,K>-,K,Le 

a 



g is C\ with compact support 



We claim that 



(4.28) inf <J / Af{z)Tt{dz) 



if(A,#)eQ e , 

-oo if(A,vr)eQ\Q e . 



It is easy to see that for (A, ft) G Q e , and g being C\ with compact support, 
J Agfr = 0. Next, we can find a sequence f n (z) — > z pointwise under the bound 
|/n(z)| < az + (3, for some a, (3 > 0, where f n (z) is C\ with compact support. 
But by our definition of Q, J ztt < oo. So by the dominated convergence theorem, 
J Aztt = 0. The nontrivial part is to prove that if for any g G Q = {g(z) + 
L,g is C\ with compact support} such that J Agn = 0, then (A, it) G Q e . We 
can easily check the conditions in Echevrria [32J. (For instance, Q is dense in 
C(1R + ), the set of continuous and bounded functions on IR + with limit that exists 
at infinity and A satisfies the minimum principle, i.e. Af(zo) > for any f(zo) = 
inf z£ ]R+ f(z). This is because at minimum, the first derivative of / vanishes and 
A(^o)(/(^o + a) — f(zo)) > 0. The other conditions in Echeverrfa [32J can also 
be easily verified.) Thus, Echevrria [32] implies that it is an invariant measure. 



Now, our proof in Lemma 22 shows that it has to be unique as well. Therefore, 
(A,7r) G Q e - This implies that if (A, -ft) G Q\Q e , there exists some g G Q, such 
that J Agn ^ 0. Now, any constant multiplier of g still belongs to Q and thus 
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inf g£ g f Agfx = — oo and hence inf j e jr f Afft = — oo if (A,7r) G <2\<2 e - 
Therefore, 



(4.29) 
(4.30) 



liminf - logEfe '] > sup inf < 
> sup inf • 



9b 



zfx — H(X, ft) 



9b 



zfx — H(X, ft) 



where 1Z = {(Xfx, ft) : (A, ft) G Q} and 



Affx 
Affx 



(4.31) 



H{X,fx) = / (A-A) + log A/A A 



71". 



Define 



(4.32) F(Xft,ft,f) 



9b 
a 
9b 



zfx-H{X,fx) + / Affx 



df 



a zfx - H(X, fx) - / bz^-fx + / (/(* + a) - f(z))Xfx. 



Notice that F is linear in / and hence convex in / and also 



(4.33) 



H(X, 



71 



sup 

/ec 6 (R+) 



Xf + X(l-e j 



7X 



where C{>(1R + ) denotes the set of bounded functions on IR + . Inside the bracket 
above, it is linear in both fx and Xfx. Hence H is weakly lower semicontinuous 
and convex in (A7T,7r). Therefore, F is concave in (Xft, ft). Furthermore, for any 
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f = Kz + g + LeF, 



(4.34) F(Avr, ft, /) = f (--KjbzTi- H(X, fc) - f bz^-n 

+ / (51(2; + a) - #(<2))A7r + Ka \tc. 



If A n 7r n — )■ 7oo and 7r„ — >• vr^ weakly, then, since g is Ci with compact support, we 
have 

(4.35) - / bz—iT n + / (g(z + a) - g(z))X n 7T n + Ka \ n ir n 
-)■ - / bz—Ttoo + / (g(z + a) - g(2:))7oo + Ka 7^, 

as n — )• 00. Moreover, in general, if P n — > P weakly, then, for any / which is upper 
semicontinuous and bounded from above, we have limsup n J fdP n < J fdP. Since 
(f — K) bz is continuous and nonpositive on IR + , we have 

(4.36) limsup / ( - - K ) bzir n < I - - K) bzn^. 

n^oo J \a J J \a J 

Hence, we conclude that F is upper semicontinuous in the weak topology. 



In order to switch the supremum and infimum in (4.30), since we have already 
proved that F is concave, upper semicontinuous in (Att, tt) and convex in /, it 
is sufficient to prove the compactness of 1Z to apply Ky Fan's minmax theorem 
(see Fan [37]). Indeed, J06 developed some level set method and proved that it 
is sufficient to show the compactness of the level set (see J06 [SO] arid Frenk and 
Kassay [ID]). In other words, it suffices to prove that, for any Cgl and / G F, 
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the level set 



(4.37) 



(Air, n)en:H^ I bz%n - —zn - X[f(z + a) - f(z)]n < C 

oz a 



is compact. 

Fix any / = Kz + g + L E J 7 , where K > - and g is C\ with compact support 
and L is some constant, uniformly for any pair (Air, fr) that is in the level set of 



(4.37), there exists some Ci, C 2 > such that 



(4.38) 



C 1 >H+[K--)blz7r-C 2 I Att 



> 

'\>cz+£ 

-c 2 



A-A + Alog(A/A) 



7T 



K --)b I zir 

a. 



Xir — C 2 / Xit 

\>cz+e J'\<cz+e 



> 



mm log — — 1 - C 2 

2>o X(z) 



A7T + 



\>cz+i 



-c-Co+lK 



zir - ec 2 



We choose < c < [K — | ) <£- and £ large enough so that min z > log ^±y — 1— C 2 > 



0, where we used the fact that lim^^ 



X(z) 



and min^ X(z) > 0. Hence, 



(4.39) 



zn<C, 



Att < C 4 , 



\>cz+e 



where 



(4.40) 



C, 



Ci + £C 2 



c 



Ci + £C 2 



C 2 + (#-£)&' min,>ologf±f-l-C : 
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Therefore, we have 



(4.41) / Att = / Att + / Att < C 4 + c • C 3 + £, 

J\>cz+e J\<cz+e 



and hence 



(4.42) #(A, tt) < Ci + C 2 [C 4 + c-C 3 + £]< oo. 



Therefore, for any (A„7r n , 7r n ) G 7?., we get 



(4.43) lim sup / n n < lim sup - / zn n < lim — - = 0, 



i— >oo n J s« ^— >0O n t I i?— >oo 



which implies the tightness of 7r n . By Prokhorov's Theorem, there exists a subse- 
quence of n n which converges weakly to n^,. We also want to show that there exists 
some 7oo such that \ n 7i n — > 7oo weakly (passing to a subsequence if necessary). It 
is enough to show that 

(i) SUp n f A n 7T n < oo. 

(ii) lim^oo sup„ f z > e \ n n n = 0. 

(i) and (ii) will give us tightness of X n 7f n and hence implies the weak convergence 
for a subsequence. 

Now, let us prove statements (i) and (ii). 
To prove (i), notice that 

f f b b 

(4.44) sup / A n 7T n = sup / -zn n < -[C 4 + c • C 3 + £] < oo. 

n J n J a a 

To prove (ii), notice that (A — A n ) + A n log(A n /A) > 0. That is because x — 1 — 
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log x > for any x > and hence 



(4.45) 



A - A + A log(A/A) = A (A/A) - 1 - log(A/A) 



>0. 



Notice that 



(4.46) 



£— >oo 



lim sup / A n 7r n < lim sup 



n Jz>£ 



I— >oo 



^ri^n 



n J X n <V\z,z> 



+ lim sup 

^°° n J\ n >JXz,z>l 



For the first term, since sup n J zii n < oo and lim 2 



X(z) 



An"n m 



o, 



(4.47) 



lim sup 



i— >oo 



A n 7T n < lim sup / V\zn n = 



n J \ n <VXz,z> 



i^-oo 



n Jz>£ 



For the second term, since lim sup 2 



X(z) 



(4.48) 



lim sup 



t^rOO 



AnT^r. 



n J\ n >V\z,z>i 

< lim supH(X n , 7r n ) sup 



A, 



i— 5>00 



A„>vAz,z> 



A - A„ + A„log(A n /A) 



Therefore, passing to some subsequence if necessary, we have \ n i\ n — > 7oo and 
TTn ~~ > ^oo weakly. Since we proved that F is upper semicontinuous in the weak 
topology, the level set is compact in the weak topology. Therefore, we can switch 
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the supremum and infimum in (4.30) and get 



(4.49) 
(4.50) 

(4.51) 

(4.52) 

(4.53) 

(4.54) 



liminf-logE[e 0JVt l 



> inf sup sup 



—ztt + (A - A)tt - log(A/A)A7r + Afitl 



= inf sup 

\0bz 
= ml sup 



?^ + A ( z )( c /(*+*)-/W_l)_ 6z ^ 



a 



7T 



•(dz) 



^ + A(^)(c/(*+«)-/«-i)-te^ 



c/ 



= inf sup 






'At 06 

i - 



> inf sup < 1 z > . 

ueUg 2 > \ u a ' 



We need some justifications. Define G(X) = A — log(A/A)A + Af. The supremum 
of G(A) is achieved when |S = which implies A = Ae-^ z+a ' ) ~^''. Notice that 
for /6 J 7 , the optimal A = Ae-^ z+0 ) - ^*-' satisfies f A# < oo since J Air < oo and 



f ztt < oo. This gives us (4.51). Next, let us explain (4.52). For any probability 
measure 7r, 



(4.55) 



6 ^ + X {z){e^-^)-l)-bz d 4- 
a oz 



< sup 

2>0 



flfe 



+ A(z)(e / ( z+a )- / W-l)-62^ 
a az 



7t(dz) 
Of 



which implies the right hand side of (4.51) is less or equal to the right hand side 



of (4.52). To prove the other direction. For any / = Kz + g + L e J 7 , we have 



(4.56) 



— + \(z)(e^-^ - 1) - bz^f 
a oz 

= (M_ Kb \ z + Kz){e Ka +g{z+ a)- g{z) _ 1} _ bz 99_ 

\ a J oz' 
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which is continuous in z and also bounded on z G [0, oo) since g is C 1 with compact 



support and K > - and lim 2 

that 



X(z) 



0. Hence there exists some z* > such 



(4.57) 



9bz 



X(z)(e 



f(z+a)-f(z) 



l)-bz 



d_l 

dz 



6 K + X(z*)(e^ + ^-^-l)-bz*^- 
a oz 



Take a sequence of probability measures 7r n such that it has probability density 
function n if z G [z* — ^-, z* + ^-] and otherwise. Then, for every n, J Z7r n (<i2:) < 
oo. Therefore, we have 



(4.58) 



lim 



— + A(z)(e^ +fl )-^-l)-6^^ 
a oz 



7t n {dz) 



lim n 

n— >oo 



~2u 



2n 



— + A(^)(e^ +a )-^)-l)-te^ 



fi2 



— + \(z*)(e fiz * +a) - f ^ - 1) - 6*^ 



sup 

2>0 



- + A(«)(e«^)-«')-l)-^ 



We conclude that the right hand side of (4.51 ) is greater or equal to the right hand 



side of (4.52) 



Notice that for any / = Kz + g + L G J 7 , 



(4.59) 



6 ^ + X {z){e^-^)-l)-bz d 4- 
a oz 

= b ( e ~ Ka K + X{z){e Ka +aiz+ a)- 9 ( Z ) _ 1} _ bz dg 

a oz 



whose supremum is achieved at some finite z* > since lim 2 



X(z) 



0, K > 
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and g G C 1 with compact support. Hence J zfr < oo is satisified for the optimal %. 



This gives us (4.52). Finally, for any / G J 7 , u = e* G Ug, which implies (4.54). D 



Lemma 23. Assume limj._5.00 -^ = 0, we have Efe 61 ^*] < 00 for any 9 G 



Proof. Observe that for any 7 G _£, 



(4.60) 



exp | 7 _Vt - / (e 7 - l)A(^)d. J 



is a martinagle. Since lim 2 



X(z) 



0, for any e > 0, there exists a constant 



C e > such that X(z) < C e + e^ for any „ > 0. Also, 



(4.61) 



' Z s ds = / / h(s — u)N(du)ds 
io Jo 


-f 

Jo 


I h(s — u)ds 

-J u 


N(du) 


Jo 


' POO 

/ h(s — u)ds 

.J u 


N(du 



_ JVt- 



Therefore, for any 7 > 0, 



(4.62) 



I = E e 7i.-JJ(eT-l)A(Z.)_ 

> E [ e 7iVt-(ei , -l)/ < (C (1 +e^.)4 S 

> E f' e 7JVt-(e''-l)C«t-(eT-l)6[|ft[| x ,iiVt1 



For any 6> > 0, choose 7 > 6 1 and e small enough so that 7 — (e 7 — l)e||/i||x,i > 9. 
Then, 



(4.63) 



E [e eN >] < eW-W* < 00. 



Ill 



□ 

Now, we are ready to prove the large deviations result. 

Theorem 15. Assume lim^oo -^ = and that A(-) is continuous and bounded 
below by some positive constant. Then, (^ G •) satisfies the large deviation prin- 
ciple with the rate function !(•) as the Fenchel-Legendre transform ofT(-), 

(4.64) I(x) = sup{6x-T(6)}. 



Proof. If lim sup z _ jh00 -^ = 0, then the forthcoming Lemma 25 implies that T(6) < 



oo for any 6. Thus, by Gartner-Ellis Theorem, we have the upper bound. For 
Gartner-Ellis Theorem and a general theory of large deviations, see for example 
[30] . To prove the lower bound, it suffices to show that for any x > 0, e > 0, we 
have 



1 . „(N t 



(4.65) liminf-logP — e B e (x) > - sup{6x - T(0)}, 

t^oo t \ t 



where B € (x) denotes the open ball centered at x with radius e. Let P denote 



the tilted probability measure with rate A defined in Theorem 14 By Jensen's 
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inequality, 



(4.66) ^hgF (^ e B e (x] 



log 



-dF 



t J^eB e (x) dF 

hogF (^ E B e (x)\ + ^log 

lncF 1 * r R (t-"| 1 


1 


- / ,dF 

J^ € B e (x) dF 

dF~ 
l^ 6Be(cc) log- 


_P(feB £ (i)) 

1 .Ie 


t 1 g U ( v p(f 


eB e (x)) t 



By the ergodic theorem, 



(4.67) 



liminf - logP ( — E B e (x) ) > -A(x) 



where 



(4.68) 



(A,7r)6< 



A(x) = . inf M (A - A)tt + / log(A/A)A7r ^ , 



and 



(4.69) 



Q x e = <(\,7t)eQ e : / A(*)7r(<fe) = a L 
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Notice that 



(4.70) r(0)= sup { [e\n+ /"(A-A)tt- [\og(\/\)\A 



sup sup 

x (A»e£ 

sup sup 
x (A»e£ 



<9Att + / (A - A)tt - / log(A/A)Avr 
06 



Z7r(dz) + / (A - A)tt - / log(A/A)A7r 



sup{6>x — A(x)}. 



We prove in Lemma 24 that A(x) is convex in x, identify it as the convex conjugate 
of T(6) and thus conclude the proof. D 



Lemma 24. A(x) in (4.68) is convex in x. 



Proof. Define 



(4.71) 



Then, 



H(X,Tt)= /"(A-A> + /"log(A/A)Avr. 



(4.72) 



A(x) = inf F(A,tt) 

(A»eQg 



We want to prove that A(ax± + 0X2) < aA(xi) + /3A(x2) for any a, /3 > with 
a + (3 = 1. For any e > 0, we can choose (\k,^k) £ Qe fc sucn that H(\ k , jt k ) < 
A(x k ) + e/2, for fc = 1, 2. Set 



(4.73) 



7T 3 = «7Ti + /37T 2 , A 3 



rf(a7Ti) 



-Ai + 



d((37T 2 ) 



d(a7Ti + /3Tf 2 ) d(aiti + l3fr 2 ) 



A 



2- 
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Then for any test function /, 



(4.74) 



A 3 fir 3 = a / AJiti + / A 2 fn 2 = 0, 



which implies (A3, 7T 3 ) G Q e . Furthermore, 



(4.75) 



A 3 7r 3 = a / Xiitx + f3 X 2 fc 2 = ax\ + /3x 2 . 



Therefore, (A3, #3) G Q^i+Z 3 ^. pi na iiy ; since xlogx is a convex function and if we 
apply Jensen's inequality, we get 



(4.76) if (A 3 , rr 3 ) = / [(A - A 3 - A 3 log A) + A 3 log A 3 



7T 3 



< I (A - A 3 - A 3 logA) + a— — AilogAi + /3— — A 2 logA 2 

a7T 3 d7T 3 

aif(Ai,7ri) + /3if(A 2 ,7r 2 )- 



7T 3 



Therefore, 
(4.77) 
A(axi + /3x 2 ) < if (A 3 , tt 3 ) < aff (Ai, #1) + /3ff (A 2 , tt 2 ) < aA(xi) + /3A(:r 2 ) + e. 



Lemma 25. f/ lim sup 2 _ 5h00 -4^ < -, £/ien /or any 



□ 



(4.78) 



6> < log 



ahmsup^^^ 



a 1. A(z) 
1 + — • lim sup 

z— >oo 2! 



we /iawe r(6>) < 00. f/ lim sup z 



X(z) 



0, then T(6) < 00 for any 9 G 
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Proof. For K > -, we have e G IAq and 



(4.79) r(0) < inf sup -r-2 < sup J —^ + _ 2 

= sup [- (bK --\z + \(z)(e Ka - 1) | . 



Define the function 



(4.80) F(K) = -K + \imsup-^-(e Ka -l). 



Then F(0) = 0, F is convex and F(K) — > oo as K — > oo and its minimum is 
attained at 



(4.81) JT = ±log(- 3^1 >0. 



<*' \ alimsup 



z— >0O 2 



and F(K*) < 0. Therefore, r(0) < oo for any 

(4.82) 9 < -a min J -K + lim sup -^ • (e Ka - I) 

k>o { ^oo bz 

= lo s -p W) " x + 6 ' hmsup "T" < a - 

If limsup z _^ 00 -^ = 0, trying e Kz G We for any K > -, we have r(#) < oo for any 
0. D 
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4.3 Large Deviations for Markovian Nonlinear 
Hawkes Processes with Sum of Exponentials 
Exciting Function 

Now let h be a sum of exponentials, i.e. h(t) = J2i=i a-ie~ hlt and let 
(4.83) Zi(t) = J2 ^~ hl{t ~ T] \ ±<i<d, 



Ti<t 



and Z t = ^2 i= iZi(t) = ^2 T . <t h(t — Tj). It is easy to see that (Zi,...,Za) is 
Markovian in IR d with generator 

(4.84) Af = -^b iZ ~ + \ [j2 Zl j ■ [f(zi + a u . . . ,z d + a d ) - f( Zl , . . . , z d )]. 

i=l "' \i=l / 

Here bi > for any 1 < i < d, but o^ can be negative, as long as h(t) = 
E?=iO»e- M > 0. In particular, h(0) = Eii °* > °- If a i > °> then Z i^) > ° 
almost surely; if a» < 0, then Zi(t) < almost surely. 

Theorem 16. Assume lim^oo -^ = ; A(-) is continuous and bounded below by 
a positive constant. Then, 

i ( A ft d 

(4.85) lim - \o g me eNt ] = inf sup {— + —, V b { , 

where Z = {(zi, . . . , z d ) : OjZj > 0, 1 < i < d} and 

(4.86) U e = {ueC 1 (R d ,R + ),u = e f ,fe J 7 }, 



117 



where 



(4.87) 



F={f=g 



Ed 



+ L,Lel,3ee 



where 



(4.88) £ = < N^ KeiZi + g, K > 0,g is C\ with compact support 



j=i 



Proof. Notice that 



(4.89) 



dZi(t) = -biZi(t)dt + OidiVi, 1 < z < d. 



Hence, OjiVf = Zj(£) — Zj(0) + J biZi(s)ds and 



(4.90) E[e WVt ] = E 



^> s ^-u, ; — — + ^ — / Y. h ^ s ) ds 



Following the same arguments in the proof of Theorem [14], we obtain the upper 
bound 



1 \ A O 

(4.91) limsup-logE[e eiVt ] < inf sup <— + ^ V^ 
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As before, we can obtain the lower bound 



(4.92) liminf-logE[e' 



em 



> sup 

(A,x)eQ £ 

> sup inf 



(A,*)efi 



geg 



sup inf 

(A,-fi-)sQ /&F 



OX — A + A — A log ( A/A j 7t(dzi,...,dzd) 
6X - A + A - A log (X/x) + Ag 

e Eti h i z i 



TX 



Ed 
i=1 Oi 



- X + X - X log A/A + Af 



TV. 



The equality in the last line above holds by taking / = g + L + J^J^ 1 z% E T for 



Ei=i a i 



g G Q, where 



(4.93) Q = < y^ KtiZi + g, K > 0, g is C\ with compact support > . 



i=i 



Here, e^ = a^ / 1 a^ | , 1 < i < d. Define 



(4.94) F(X7t,7t,f) 



Ed 



Af 



7T-i7(A,7T) 



F is linear in / and hence convex in /. Also H is weakly lower semicontinuous 
and convex in (A7r,7r). Therefore, F is concave in (Xk, 7r). Furthermore, for any 
/ = °M-f- + Eti *^ + 5 + £ e T, 



(4.95) F(A7T,7r,/) 



+ ^Xe, 



i=i 



Air— / y^KeibiZifi — H(X,7t)+ / *4g7r. 

^ i=l ^ 



If A ra 7r n — >■ 7oo and 7r„ — > tt^ weakly, then, since g is C\ with compact support, we 
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have 



(4.96) 



E*« 



i=l 



\ n ir n + / Agir n -> 



E^- 



j=i 



7oo + / ^TToo- 



Since — E^ i=1 KeibiZi is continuous and nonpositive on Z, we have 



(4.97) limsup 



Y^KCib 



i.~"i. 



i=\ 



K n < 



Y^KeA 



i=l 



7Tno- 



Hence, we conclude that F is upper semicontinuous in the weak topology. 

In order to apply the minmax theorem, we want to prove the compactness in 
the weak topology of the level set 



(4.98) <(A7r,7r): 



Ed 
i=i a i 



Af 



tt + H(\,tt) <C 



For any / = j^- 1 - + X^=i KtiZi + g + L G J 7 , where g is C\ with compact support 



etc., there exist some C\, C 2 > such that 



(4.99) 



ZiTY — Co. \n 



C l >H + J2Kb l e l ( 

d 

A - A + A log(A/A)j 7t + Y, Kbi€i 



> 

-c 2 



Att-C 2 



i=\ 

\7t 



Z:Ti 



> 



c 1 z 1 H h c d z d + £ 

mm log — — ; r 1 — C 2 

( Zl ,...,z d )£Z X(z 1 H h z d ) 



Att 



•^^Z_,i — 1 C i^i 



i=l "^ 



If Oj > 0, then Q > 0, pick up q > such that — q • C 2 + iV6^e^ > 0. If at < 0, then 
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e.j < 0, pick up q such that — q • C 2 + i^^e^ < 0. Finally, choose i big enough such 
that the big bracket above is positive. Then 



(4.100) 



\zi\n < C 3 , 



\tt<Ca 



^^.Z^i — l c i z i 



Hence, j Air < C5 and H < Cq. We can use the similar method as in the proof of 
Theorem ITU to show that 



(4.101) 



lim sup / A ra 7r n = 0, 1 < i < d. 

£->°° n J\ Zi \>£ 



For any (A n 7r n , 7r n ) G TZ, we can find a subsequence that converges in the weak 
topology by Prokhorov's Theorem. Therefore, 



(4.102) 



liminf-logE[e flJVt ] 

t— >oo t 



> sup inf 



(A,7r)eQ 



fef 



inf sup sup 



9 Eti h jZj 

Ed 
i=i a i 

2^=1 fl i 



-A + A-Alog(A/A) + Af 



T\ 



-A + A-Alog A/A + .A/ 



7T 



inf sup 9 ^= lbiZi + \( e f(*r+°u...*i+**)-n*i,...,za) -1)-J2 hz, 9f 



f^ ( Zl ,...,z d )eZ Y J i=i a 

Au e 



t=i 



9^ 



> inf sup 



+ 



E^ 



ueu e {zi _ Zd)eZ I u Ei=i a i^ 



That is because optimizing over A, we get A = Ae^ Zl+ai ''"' 2d+ad - ) f( Zl >-> z <d and finally 
for each / G J 7 , u = e' G Ug. D 

Theorem 17. ilssitme lim^oo -^ = 0, A(-) is continuous and bounded below by 
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some positive constant. Then, (=y- 6 •) satisfies the large deviation principle with 
the rate function /(•) as the Fenchel-Legendre transform ofT(-), 

(4.103) I(x) = sup{6x-T(6)} 1 

where 



7T. 



(4.104) r(0) = sup / \e\ - A + A - A log (a/a 

(A,^)eS e •/ 

Proof. The proof is the same as in the case of exponential h(-). □ 

4.4 Large Deviations for a Special Class of Non- 
linear Hawkes Processes: An Approximation 
Approach 

We already proved in Chapter ^ a large deviation principle of (N t /t E ■) for 
nonlinear Hawkes process by proving a level-3 large deviation first and then ap- 
plying the contraction principle. In this section, we point out that there is an 
alternative approach, i.e. for general exciting function h(-), we can use sums of 
exponential functions to approximate h(-) and use the large deviations for the case 
when h(-) is a sum of exponentials to obtain the large deviations for general h(-). 
The advantage of approximating the general case by the case when h is a sum 
of exponentials is that the rate function for the large deviations when h is a sum 
of exponentials can be evaluated by an optimization problem, which should be 
computable by some numerical scheme. 
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Theorem 18. Assume that A(-) > c for some c > 0, lim^oo -^ = and A(-) Q is 
Lipschitz with constant L a for any a > 1. We have (N t /t G •) satisfies the large 
deviation principle with the rate function 

(4.105) I{x) = sup{9x - T(6)}. 



Remark 5. The proof of Theorem 18 will be given in Appendix 



Remark 6. The class of nonlinear Hawkes process with general exciting function 
h for which we proved the large deviation principle here is unfortunately a big too 
special. It works for the rate function like X(z) = [log(c + z)] 13 for example but does 
not work for A(-) that has sublinear power law growth. 
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Chapter 5 

Asymptotics for Nonlinear 
Hawkes Processes 



In the existing literature of on nonlinear Hawkes processes, the usual assump- 
tion is that A(-) is a-Lipschitz, h(-) is integrable and a||/i||^i < 1. But how about 
other regimes? How do the asymptotics vary in different regimes? This is the 
question we would try to answer in this chapter. 

We divide the nonlinear Hawkes process into the following regimes. 

1. lini^oo -^ = 0. This is the sublinear regime. In this regime, if we assume 
that A(-) is a-Lipschitz, \\h\\ L i < oo and a||/i|| L i < 1, then there exists a 
unique stationary version of the nonlinear Hawkes process. The central limit 
theorem and large deviations for this regime are proved in Zhu |113j . [lllj 
and |112j . On the contrary, if we assume that II^Hl 1 = oo, then, there is no 



stationary version. Figure 5.1 illustrates Aj in this case. We will obtain the 



time asymptotics for X t in Section [5TT 
2. lini^oo -^ = 1 and \\h\\ii < 1. This is the sub-critical regime. In this 

124 



regime, if we assume that A(-) is a-Lipschitz and a||/i||x,i < 1, then there ex- 
ists a unique stationary version of the nonlinear Hawkes process, see Bremaud 
and Massoulie |14j . The central limit theorem is proved in Zhu |113j . Figure 



5.3 illustrates A+ in this case. We will summarize some known results about 



the limit theorems in Section 15.2 



X(z) 



3. lim^oc -^ = 1 and ||/i||z,i = 1. This is the critical regime. This regime is 



very subtle. We will show in Section 5.3 that in some cases, there exists a 



stationary version of the Hawkes process. In some other cases, it does not 
exist. In particular, when A(z) = v + z and J Q th(t)dt < oo, we will prove 
that ^f — y J Q rjgds, where rj s is a squared Bessel process. N[T, T + |r] will 
converge to a Polya process as T — y oo. Figure [5~4| illustrates the behavior 
of A* in this case. When h(-) has heavy tails, i.e. L th(t)dt = oo, we will 
prove that the time asymptotic behavior is different from the light tail case. 

4. lim^oo -^ = 1 and ||/i||z,i > 1. This is the super-critical regime. We will 



prove in Section 5.4 that A* grows exponentially in t in this regime, which is 



consistent with what we can see in Figure 5.5 



5- Yl^o \(n) < °°- This is the explosive regime. In Section 5.5, we will first 
provide a criterion for the explosion and non-explosion for nonlinear Hawkes 
process. Then, we will study the asymptotic behavior of the explosion time. 



Figure 5.6 illustrates the explosion of a finite time. 



Notice that if \\h\\i,i = oo and lim^oo -^ = a > 0, then one is in the super- 
critical regime and we will see that X t grows exponentially; this is discussed Section 
If \\h\\ L i = oo and X]^=o \h) < °°> then one is in the explosive regime to be 



5.4 



discussed in Section 15.51 
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We will launch a systematic study of the time asymptoics for Hawkes process in 
different regimes. We will study the sublinear regime, sub-critical regime, critical 



regime and super-critical regime in Sections 5.1[ 5.2, 5.3, 5.4 respectively. Finally 



in Section 5.5, we will provide a criterion for explosion and non-explosion for 
Hawkes process and obtain some asymptotics for the explosion time. 




Figure 5.1: Plot of intensity At for a realization of Hawkes process. Here h(t) 
(t + l)"5 and X(z) = (1+z)*. 




Figure 5.2: Plot of intensity At for a realization of Hawkes process. Here h(t) = 
and \(z) = (1 + z)^. In this case, \\h\\ L i < oo and A(-) is sublinear and 



(t+l)3 



Lipschitz. It will converge to the unique stationary version of the Hawkes process. 
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Figure 5.3: Plot of intensity At for a realization of Hawkes process. Here h(t) = 
( t \)3 an d X(z) = 1 + z. In this case, \\h\\^ — \ < 1- It is in the sub-critical regime. 
This is a classical Hawkes process and it will converge to the unique stationary 
version of the Hawkes process. 



5.1 Sublinear Regime 



A(«) 



In this section, we are interested in the sublinear case lim^oo -^ = 0. If 
|| h | |^i < oo and A(-) is a-Lipschitz and a||/i||x,i < 1, then, as Bremaud and Mas- 
soulie [H] proved, there exists a unique stationary Hawkes process. Recently, 
Karabash [63] relaxed the Lipschitz condition and proved the stability result for a 
wider class of A(-). Let P and E denote the probability measure and expectation 
for stationary Hawkes process. Then, by ergodic theorem, we have the law of large 
numbers, 



(5.1; 



Nt 
t 



->H = E[N[0, 1]], asi^oo. 



The central limit theorem and large deviations have already been discussed in 
Chapter |2j Chapter [3] and Chapter |4} 

If ||/i||_Li = oo, then there is no stationary version of Hawkes process and At 
tends to oo as t — > oo. This is the case we are going to study for the rest of 
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Figure 5.4: Plot of intensity At for a realization of Hawkes process. Here h(t) = 
ut^js and X(z) = 1 + z. In this case, \\h\\ L i = 1, J °° th(t)dt < oo and A(-) is linear. 
It is therefore in the critical regime. From the graph, we can see that X t grows 
linearly in t, which will be proved in this chapter. Indeed, we will prove that -jf- 
converges to j Q i] s ds as T — > oo, where rj s is a squared Bessel process. 



the subsection. We are interested the time asymptotic behavior of the nonlinear 
Hawkes process in this regime. 

Let us first make a simple observation. Assume that X(z) f °° as z ~~>* °°- 
Then, assuming H^Hl 1 = oo, we have A* — > oo as t — > oo a.s. This can be seen by 
noticing that J h(t — s)N(ds) — > oo a.s. if ||/i||x,i = oo, where N t follows from a 
standard Poisson process with constant rate A(0). 

Let us prove a special case first. 

Proposition 1. Assume that h(-) = 1 and X(z) = 7(1/ + z) 13 , where 7, v > and 
< (3 < 1. Then, 



(5.2) 



A, 



/3 



->• 7 1 -' 3 (1 -/3) 1 -/ 9 , 



in probability as t — > 00. 
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Figure 5.5: Plot of intensity X t for a realization of Hawkes process. Here h(t) = 
U+D3 and X(z) = 1 + z. In this case, \\h\\ L i = | > 1 and it is in the super-critical 
regime. We expect that A* would grow exponentially in this case. 



Proof. For a > 0, 



(5.3) 



d\t 






7" (y + N t + 1)« — 70 (z/ + iV t )°< 






A/ + 7^ 



A? 



diV*. 



diV, 



Let a 
(5.4) 

A/ = 



j3g. We have 



?\i- 



(\!+>y?y- f '-{\§) 



X.ds 



(AI+7?) 



1-/3 



(A 



7U-/8 



dM s 



Since A t — > 00 a.s. as £ — > 00, by the bounded convergence theorem, 



(5.5) 



E 



(A/ +7?) 



1-/3 



(A 



MH 



A s Us -> (1 - /3) 7 ' 



as t -> 00. It is not difficult to see that \ J [(A| + 7^ ) 1_/3 - (Af ) 1 " /3 ]dM s -)■ in 

1-/3 

A ^3- 1 

probability as t — > 00. Hence, —. >■ (1 — /3)7' 3 in probability as £ — >■ 00. D 
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Figure 5.6: Plot of intensity X t for a realization of Hawkes process. Here h(t) = 
( t \)3 an d \(z) = (1 + z)z. This is in the explosive regime. The plot is a little bit 
cheating because it is impossible to "plot" explosion. Nevertheless, you can think 
it as an illustration. It "appears" that the process explodes near time t = 6. 



Remark 7. Assume that h(t) — (t + l) s , 5 > —1 and X(z) = ■y(u + Zf , where 
7, v > and < < 1. We conjecture that 



(5.6) 



A, 



-)-7 1 -' 3 J B((5, a) 1 - 



1 „.S( 



as t — )■ oo a.s., where a = 1 _l and B(S, a) = f Q u s (l — u) a du 



5.2 Sub-Critical Regime 



In this section, we review some known results about the limit theorems in the 
sub-critical regime. We say the Hawkes process is in the sub-critical regime if 
lim^oo -^ = 1 and \\h\\ii < 1. If we further assume that A(-) is a-Lipschitz and 
a || h || £i < 1, then Bremaud and Massoulie [2] proved that there exists a unique 
stationary Hawkes process. In this regime, we also have the law of large numbers 



and the central limit theorem just as in Section 5.1 For the case when A(-) is 



nonlinear, we refer to the review in Section 5.1 for the law of large numbers and 
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central limit theorem. 

In particular, when X(z) = v + z and v > 0, we have explict expressions for the 
law of large numbers, central limit theorem and large deviation principle. They 
are well known in the literature. 

The ergodic theorem implies the following law of large numbers, 



(5.7) > ——. — , as t — y oo a.s. 

t 1-INU 1 

Bordenave and Torrisi [IT] proved a large deviation principle for (^ 6 •) with the 
rate function 

\xlog( T ^ w -)-x + x\\h\\ Ll +u iixe[0,oc) 

(5.8) I(x) = < v L J 

+oo otherwise 



Bacry et al. [2] proved a functional central limit theorem, stating that 

(5.9) — ~ -/Xt -» aB(-), ast^oo, 

y/t 

on D[0, 1] with Skorokhod topology, where 

5 - 10 P = 1 nTTF - and ° = 71 uuT^s' 

When A(-) is nonlinear and sub-critical, the central limit theorem has been 
obtained in Chapter [2j 
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5.3 Critical Regime 

In this section, we are interested in the critical regime, i.e. lim z _!. 00 -^ = 1 
and \\h\\i,i = 1. This regime is very subtle. In some cases, there exists a stationary 
version of Hawkes process whilst in some cases there does not. For example, 
Bremaud and Massoulie [15] proved that 

Proposition 2 (Bremaud and Massoulie). Assume X(z) = z, \h\jj. = 1 and 

(5.11) supt 1+a h(t) < R, lim t l+a h{t) = r, 

for some finite constants r, R > and < a < | . Then, there exists a non-trivial 
stationary Hawkes process with finite intensity. 

Bremaud and Massoulie considered only the linear Hawkes process in their 
paper [15]. If you allow nonlinear rate function, you get a much richer class of 
Hawkes processes and in some cases, there still exists a stationary Hawkes process. 
It is much easier to work with the exponential case, i.e. when h(t) = ae~ at and 

INIl^i- 

The lecture notes by Hairer [17] provides a sufficient condition for which there 
exists an invariant probability measure. Let £ be the generator of a Markov 
process. If there exists V > 1, continuous, with precompact sublevel sets and 
some function <p : IR + — > IR + strictly concave, increasing, with 0(0) = 0, and 
<f)(x) — > oo as x — > oo and CV < K — <f)(V) for some K > 0, then there exists an 
invariant probability measure. 

Proposition 3. Assume h(t) = ae~ at , a > and X(z) = z-ifj(z) + u, where ip(z) 
is positive, increasing, strictly concave and ip(z) — > oo and ^-^- — >• as z — > oo. If 
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also X(z) is strictly positive. Then there exists an invarint probability measure. 

Proof. Let V(z) = z + 1 and (j)(V) = a(i/>(V) - if>(0)). Then : R+ ->■ R + 
is increasing and strictly concave, <f)(z) — > oo, and 0(0) = 0. Recall that the 
generator is given by 



(5.12) Af(z) = -az^ + X(z)[f(z + a) - f(z)}. 



Hence, we have 

(5.13) AV + (f)(V) = -ip{z)a + aip{z + 1) - aip(0) + av < aip{l) - 2aip{0) + av. 

U 

We can generalize our result to the much wider class of h(-) when h(-) is a 
sum of exponentials: h(t) = Yli=i a i e ~ bil \ where hi > and <2j > 0, 1 < i < d. 
Write Zi(t) = Y,r<tW Ut ~ T) ■ Then Z t = Ti=iZi{t) and {Z 1 {t),...,Z d {t)) is 
Markovian with the generator 



(5.14) Af = -^hz t ^ + \lj2z t Y[f(z 1 + a 1 ,...,z d + a d )-f( 



Zi,...,Z d )} 



We have the following result. 



Proposition 4. Assume h(t) = ^2 i=1 aie~ blt , hi > and a,{ > 0, 1 < % < d and 

||/i||x,i = Y2i=i ~t = 1- ^ so assume that X(z) = z — ip{z) + v, where ip(z) is positive, 
increasing, strictly concave and ip{z) — > oo and ^^ — > as z — > oo and A(z) zs 
strictly positive. Then, there exists an invariant probability measure. 



vi 



Proof. Let V = Eti | + x and <KK) = V>( min i<i<d W - ^(0). Then : M+ ->■ 
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]R + is increasing and strictly concave, <fi(z) — > oo as z — > oo and 0(0) = 0. Using 
the concavity and monotonicity of ■0(-), we have 



(5.15) AV + <j)(V) 

— -^ 7 z i I + V' I min bi } fay 1 + min 6j - ip(0) + f 

( d *\ ( * Zi \ 

< —ib min fa > — ) + ^ I min b; > fa— + min L 1 — ^(0) 

\l<j<d ~^ fa I \l<i<d *r^ fa l<i<d I 



V 



< if) ( min 6, ) - 2^(0) + za 



n 

Remark 8. The following ip(z) satisfies the assumptions in Proposition^for suf- 
ficiently large v > 0. 

(i) ip(z) = (ci + C2z) a , where C\,C2 > and < a < 1. 

(^ij ^(2) = log(c 3 + z), where c 3 > 1. 

Remark 9. Lei /i be the invariant probability measure for (Zi(t), . . . , Zd(t)) in 



Prosposition 



4 Then, we have J ip (mmi<i< d bi ^2 i=1 f; + 1 ) [i(dz) < 00. 

Indeed, when h(-) may not be exponential or a sum of exponentials, we have 
the following result. 

Theorem 19. Assume X(z) = v + z — ip{z), where ib(-) : 1R + — > 1R + satisfies 
lim 2 _! >00 ijj(z) = 00 and lim^oo ^^ = and also \(z) is increasing. Also assume 
that \\h\\i,i = 1. Then there exists a stationary Hawkes process satisfying the 



dynamics (1.2). 



Proof. The proof uses Poisson embedding and follows the ideas in Bremaud and 
Massoulie |14j . Consider the canonical space of a point process on IR 2 in which 
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N is Poisson with intensity 1. Let A" = Z" = 0, t 6 R and let N° be the point 
process counting the points of N below the curve t i— y A°, i.e. iV = 0. Define 
recursively the processes A", Z™ and N n , n > as follows. 



(5.16) 

\ n t + 1 = \l! h(t-s)N n (ds)\ Z? +l = I h(t-s)N n (ds), tel, 

\J — oo J J — oo 



N n+1 (C) = / N(dt x [0,A™ +1 ]), C G B(R). 
Jc 

By our construction, A™ is an 7^ -intensity of N n (see Bremaud and Massoulie 
[H]). Since A(-) is increasing, the processes A™, Z™ and iV 1 are increasing in n 
Thus, the limit processes A t , Z t , N exist. Since A™, Z t n are stationary in t and 
increasing in n, we have 

(5.17) EA™ +1 = z/ + E[A£] / fc(t)dt - E?/>(Z n+1 ) < ^ + EA™ +1 - E^(Z n+1 ). 

Therefore, by Fatou's lemma, E[?/>(Z )] < v < oo. Thus, ip(Z t ) is finite a.s. Since 
lim^^oo ^(2) = 00, ^ is finite a.s. and thus A* is finite a.s. N, which counts the 
number of points of N below the curve t *— y A 4 , admits X t as an Tf -intensity. The 
monotonicity implies 



(5.18) A™ < A I / fc(t - s)iV(ds) J , A t > A ( / fc(t - s)N n (ds) 

Letting n — )■ 00, we complete the proof. D 



Remark 10. The following ip(z) satisfies the assumptions in Theorem 19. 
(i) ip(z) = (ci + C2z) a , where Ci,C2 > 0, < a < 1, v > c* and ac"~ 1 C2 < 1. 
(ii) ip(z) = log(c3 + z), where 1 < C3 < e v . 
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Next, let us consider the critical linear case, i.e. X(z) = v + z, v > and 
||/i||x,i=l. We also assume that m := f Q th(t)dt < oo. There is no stationary 
Hawkes process in this regime and in the rest of this subsection, we will try to 
understand its time asymptotics. 

First, let us prove a lemma concerning the expectations of Aj and N t . 

Lemma 26. Assume X(z) = u + z, v > and \\h\\i,i=l and m = j Q th(t)dt < oo. 
We have 



(5.19) ,imMM = il, ^m = JL. 



Proof. Since 

(5.20) A t = v + / h(t- s)dN s , 

Jo 

taking f(t) = E[A 4 ], we get 

(5.21) f(t) = u+ I h{t- s)f{s)ds = v + I h(s)f(t - s)ds. 

Jo Jo 



Taking the Laplace transform on both sides of the equation, it is easy to see that 
the Laplace transform / of / is given by 



(5.22) f(a)= - V , xx ~-\, asa|0, 



a(\-h(a)) rn 



a 



since /i(0) = 1 by ||/i||ii = 1 and ^ a ' ~ —h'(0) = m. By a Tauberian theorem, 
(see Chapter XIII of Feller [38]), we get ^ — y ^- as t — y oo. Using the simple fact 
that E[iVt] = J f(s)ds, we complete the proof. D 
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Theorem 20. Assume X(z) = v + z, v > and \\h\\ L i = I, m = J th(t)dt < oo 
and h(-) Lipschitz. We have the following asymptotics. 
(i) As T ->oo ; on D[0,1], 



Ntr /"* 
(5-23) ~W^ J Vsds > 



where r] t is a squared Bessel process, i.e. 

(5.24) drjt = —dt + —yfrjtdBt, r] = 0. 

m m 

(ii) limr^oo N \T, T + |;] = P(t), where P(t) is a Poly a process with parame- 
ters ^2 o^d 2pm. 

Remark 11. The fact that a squared Bessel process arises in the limit of a critical 
linear Hawkes process is not a surprise. It is well known that a critical branching 
process after certain scalings will converge to a squared Bessel process in the limit. 
This was discovered by Wei and Winnicki )10h^ . 



Remark 12. Before we proceed to the proof of Theorem \2^ let us recall that a 
Polya process with parameters a and (3 is a point process defined as the following. 
Generate a positive random variable £, with Gamma distribution of parameters a 
(shape) and (5 (scale). Conditional on £, P(t) is a Poisson process with intensity £. 
The marginal distribution of P(t) is negative binomial and unlike the usual Poisson 
process, Polya process has dependent increments. The covariance of the increments 
can be computed explicitly as Cov(P(t + 5t) — P(t), P(t)) = t ■ St ■ a/3 2 . Peng and 
Kou 192] used Polya process to model clustering effects in the credit markets. 



Proof of Theorem\2Q (i) Let H(t) := J. h(s)ds. Then, we have H(0) = 1 and 
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J °° H(t)dt = J °° th(t)dt = m. Let M t := N t — f Q \ s ds. Let us integrate X s 
f° h(s - u)N(du) + v over < s < tT. We get 



ptT ptT ps rtT ps 

(5.25) / X s ds = / / h(s — u)dM u ds + / / h(s — u)X u duds + vtT 
Jo Jo Jo Jo Jo 



Rearranging the equation and dividing by T, we get 



(5.26) 



T 



tT 



\ s ds 



Uo 



tT 



Jo 



h(s — u)\ u duds 



1 
T 



tT r s 



JO 



h(s — u)dM u ds + vt. 



Fubini's theorem implies that 

(5.27) 



1 
T 



tT 



X„du 



tT / ptT-u 



\J0 



h(s)ds I X u du 



1 
f 



tT / rtT-u 



\J0 



h(s)ds dM u + vt. 



By the definition of H(-), this is equivalent to 



ft \ Aj -r pt / 

(5.28) / TH(tT - uT) -^du = -^- + vt + - / TH(tT - uT)d I 



M, 



uT 



T 



Mr 



is a martingale and the tightness can be easily established. Furthermore, we 



have 



(5.29) 



supE 

T>0 



M 



tT 



T 



sup™E 

T>0 J 



tT 



X*ds 



< oo, 



since E[A<] < Ci for some C > by Lemma 26 This implies that the limit of 



M tT 



is also a martinagle. 

Moreover, -ffi and L hj&ds are both tight. To see this, since N t and X t are 
nonnegative, we can think of (<i (-5^-) , < t < l) and (-S^dt, < t < l) as two 
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measures. But by Lemma 26 we know that there exist some positive contant 



C > 0, such that 



(5.30) 



E 



N- 



T 



rp2 



<C and E 



A 



Uo 



sT 

~T~ 



ds 



<a 



uniformly in T > 0. Therefore, (d (ff) ,0 < t < l) and (*fdt,0 < t < l) are 
tight in the weak topology. Hence, their distribution functions ^f- and J Q ^~ds 
are tight in L>[0, 1] equipped with the Skorohod topology. Let us say that ^f- — > /3 t , 



NtT 
T 2 



-¥ 



ipt and f Q -Fj&ds — > 4>t as T — > oo. Since the jumps of -^f are uniformly 
bounded by ^ which goes to zero as T — > oo, we conclude that ip t is continuous. 
Similarly, /3 t and <f> t are continuous. Moreover, the difference 



(5.31) 



N, 



tr 



J>2 



l \ 



sT 

~T~ 



ds 



J>2 ' 



is a martingale and by Doob's martingale inequality, for any e > 0, 



(5.32) 



P( sup 
v o<t<i 



M, 



tr 



r £2 



> e 



4 ^ 

— 2^4 



tr 



Xcds 



Uo 



-►0, 



as T — ;• oo. Therefore, ip t = 4> t . Let us denote TH(-T) by ffy, -jf- by M T 
and J ^~ds by Ay. For any smooth function K(-) supported on IR + , taking the 



convolutions of the both sides of (5.28), we get 



(5.33) 



K*H T *A T = K*M T + K* (za) + — K * H T * M T . 
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Letting T — > oo, using the fact that J °° H{i)dt = J °° th(t)dt = m, we get 

(5.34) m J K(t- s)d<p s = f K(t - s){(3 s + us)ds. 

Jo Jo 

Since this is true for any K, we get "Jr — + f- Finally, 

is a martingale and if we let T — >■ oo, we conclude that 01 — 4>t is a martingale. Let 
77 t := -Jr. We have proved that ^f — > J Q rj s ds weakly on -D[0,1] equipped with 
Skorohod topology and i] t is a squared Bessel process, 



(5.36) dr) t = —dt -\ Jrj~ t dB u i] = 0. 

m m 

(ii) N[T,T + ji] has the compensator j T T \ s ds. Observe that J T T X s ds = 

1+ l 
T 2 J x ^ -^ds — > rjit as T — > oo, where r)i has a Gamma distribution with shape 

2^2 and scale 2vm. D 

Now let us consider the case when h(-) has heavy tail, i.e. f Q th(t)dt = 00. 
Let us first prove the following lemma. 

Lemma 27. Assume that 



rt /»oo 

(5.37) 1 - / h(s)ds = / h(s)ds ~ r a , < a < 1. 

T/ien, 



(5.38) ,imBM = „.^, !„„« = - iL - 

v ; t^oo t a ira t^oc t 1+a T(l - a)r(2 + a) 
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Proof. The Tauberian theorem of Chapter XIII of Feller [3H] says that 
(5.39) 1 - h{&) ~ T(l - a)a a , a^0 + . 

Let E[A t ] = f{t). This implies that 



v vo ] " 



(5.40) /(° r ) = ^ — iT^~¥n v a ^°~ 

o(l-h(o-)) r(l-ot) 



which again by a Tauberian theorem (Theorem 2 of Chapter XIII. 5 of Feller [38] ) 
implies 



(5.41) J f(s)ds ~ — 



a)r(2 + a) 
Hence 



f l+« £ -> (X). 



■'j 



(5.42) E[iV t ] = / E[A s ]cfc = / /( s )ds ~ — " t 1+a , t -► oo. 

Since E[A t ] = i/ + J /i(i — s)dE[7V s ], it is easy to check that 



(5.43) E[A t ]=/(t)~— — — r = v S -^-t a , t^oc. 

1(1— all (1 + a) 7ra 



a 



We obtain the following law of large numbers. 
Theorem 21. Assume that f°° h(s)ds ~ ^ < a < 1. T7ien ; 

iVt ^ At sin7ra 

(5.44) — )■ — — — and > v ■ , a.s. as t — > oo. 

v ; t 1+a r(l-a)r(2 + a) t a vra ' 
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Proof. Let X t = N t - E[N t ]. Then, X t satisfies (see Bacry et al. |2]) 



(5.45) 



X t = M t + I V(t - s)M s ds, 
Jo 



where M t = N t — f Q \ s ds and ip = J2 n h* n - Then, by Doob's maximal inequality, 
it is not hard to see that 



(5.46) E 

(5.47) 



N t -E[N t ] 

t X+a 



sup M\ 



s<t 



<7^ E 



t N 2 

I / ip(t- s)ds 



as t — > oo since < a < 1. Hence, as t — V oo, 



(5.48) 



Av 



->■ 



£ i+a r(l-a)r(2 + a)' 



in L as £ — >• oo. 



To show the almost sure convergence, we need only to show that \ sup s<t M s — Y 
a.s. as t — >■ oo. Define F t = J j4-c?M s . Then by Lemma 



27 



(5.49) 



supE[F t 2 

i>0 



E[A S 



o (1 + a) 



rds < OO. 



By the martingale convergence theorem, Y t —Y Y^ a.s. as t — > oo. It follows that 



(5.50) 



U ' ^-r^r / >".^ '(» 



t + 1 t + 1 J 



a.s. as £ — >■ oo. From here, it is easy to show that \ sup s<< M s — Y a.s. Finally, 
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since X t — v + f h(t — s)N(ds), we conclude that 



,_ . A t suiTra 

(5.51) > v ■ , a.s. as t — >■ oo. 

t a %a 



n 



5.4 Super-Critical Regime 

In this section, we are interested in the super-critical regime, i.e. lim^oo -^ = 
1 and H^Hl 1 > 1- First, let us compute the asymptotics for the expectations. Let 
9 > be the unique positive number such that f Q e~ et h(t)dt — 1. 9 is sometimes 
referred to as the Malthusian parameter in the literature. Let us also define 

pod 

(5.52) h(t) = h(t)e- et , m= th(t)e- et dt. 

Jo 

Clearly under our assumptions < m < oo and ||/i||z,i = 1. 

Lemma 28. (i) Assume X(z) = v + z, v > being a constant. Then, 

EfA t l v 
5.53 lim -Lp- = — . 

(ii) Assume linx^oo -^ = 1 and let X(-) be bounded below by a positive con- 
stant. Then, 



(5.54) lim - logE[A t ] = lim - logE[iV t ] = 9. 

t— >oo t i— >oo t 



143 



Proof, (i) Let f(t) = E[X t }. We have 



(5.55) '-&- = £ + jf M* - sK e «-" f ^ = £ + jf K(* " .)^* 

Taking Laplace transform, we get 



(5.56) /W^V) " V ' 



r^ 



#m 



(T 



0(1 -Ha)) 
as cr J, 0. By the Tauberian theorem, we have 

(5.57) Urn "%1 = ^. 

(ii) is a direct consequence of (i). □ 

This is consistent with the exponential case when h(t) = ae~ bt and a > b. We 
have 



(5.58) E[A t H-^ + ^e^, E [7V t ] = -^- + ^-(e^)* - 1) 
a — b a — o a — b (a — b) z 



Indeed, in the exponential case, = a — b and 

(5.59) d{Z t e'^ a ' b)t ) = e- {a - b)t dZ t + Z t de~ {a ~ b)t = -aZ t e~^ a ' b)t dt + ae^ a - b)t dN t 
Let Y t = Z t e-( a - b ^. We have 

(5.60) dY t = -aY t dt + ae- {a ~ b)t dN t = vae- {a - b)t dt + ae' {a ' b)t dM t . 
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If we assume that N(—oo, 0] = 0, then Z = and 



(5.61) 



Y t = f vae- (a - b)s ds + a f 
Jo Jo 



-(«-6)«dM,. 



Clearly, J ve (° b ^ s ds —$■ -^ and L ae *" b " >s dM s is a martingale and 



(5.62) supE 
oo 



t n 2' 

e -(«-6)«dM. 



^-» ml ^l^ <ro . 



Therefore, by the martingale convergence theorem, there exists some W in L 2 
such that 



(5.63) 



A 



' ^ — + aW, 



;( a - 6 )* a-b 



as £ — >■ oo. The convergence is a.s. and also in L 2 ( 

For the general h(-) such that ||/i||z,i > 1, we may even consider the case when 
\\h\\i,i = oo. For instance, if we assume that h(-) is decreasing and continuous and 
then h(-) is bounded and all the arguments for the case 1 < ||/i||z,i < oo would 
work for the case ||/i||i,i = oo as well. 

Theorem 22. Assume X(z) = v + z, v > 0. PFe /iawe, 



(5.64) 



— >■ 1 , a.s. as t — ¥ oo, 



,0/ 



0m m 



whereW = J™ e^dM^ 
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Proof. It is not very hard to observe that 



(5.65) 






V 

e ./o 



h(t - s) 
Tot 



dM s + 



h(t- s) 



XedS 



o 



-m 



" + l' h{ t-s)e-^-'U^)+ f h(t-s)e-^^ds 



+ I h{t-s)dM s + I h(t - s)-£-ds. 



■:,0t 











Taking Laplace transform, we get 



(5.66) X t e~ et {a) 



'- + h(a) /„" e -*dM t jfc + h(a) J °° e~^dM t 



1 - h{a) 
% + W 1 



1 - h(a) 



m a 



as a j. 0, where W = f^° e et dM t . Notice that W is well defined a.s. because 
J e dM s is a martingale and 



(5.67) 



supE 

i>0 



e- es dM. 



POO 

/ e- 2es E[X s ]ds < oo 
Jo 



by Lemma 28 Hence, by the Tauberian theorem, we conclude that, as t — > oo, 



(5.68) 



Xt v W 



.l-H 



-> ^= + 



9m m 



a.s. 



□ 



Corollary 1. $ ->• ^ + J| a.s. as t ->• oo. 



Proof. Let M 4 = N t — j X s ds. Then, since M t is a martingale and E[M t 2 ] = 
f EA s ds < Ce et for some C > 0, it is easy to see that |ff- — ¥ a.s. as t — > oo. On 
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the other hand. 












(5.69) 


-f 


X s ds 


Jo 


s ^ds^ 


1 
6 


by Theorem 22 


Hence 


we g 


;et the desired result. 





v_ W 
6m m 



D 

Remark 13. It would be interesting to study the properties of W defined in The- 
orem\2^ Observe that 



(5.70) E 



-<T^e- Bs dM a 



E 
E 
E 



- CT (/o e ~ 6sdN °-fo So h(s-u)N(du)e- es ds) 



\v(\~e~ 6t ) 



_-<r^(e-^-Slh(u-s)e-^du)N{ds) 



,f„(l_ e -«) 



-ff f*e- 0a H(t-s)N(ds) 



r T u(l~e- et ) 



where H(t) = f°° h(s)ds. Hence, 



(5.71) 



t— >oo 



where g t (s) = exp {--^e 9s H (s) + f* h(s - u)g t (u)du] - 1. 

5.5 Explosive Regime 

In this section, we will provide an explosion, non-explosion criterion for non- 
linear Hawkes processes, together with some asymptotics for the explosion time in 
the explosive regime. Let T£ = inf{£ > : X t > £}. The quantity 



(5.72) 



lim F(t £ < t) = F(t) = P(r < t), 
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is defined as the distribution function of the explosion time r. We say there is 
no explosion if F = 0, otherwise there is explosion. For a short introduction to 
explosion, non-explosion, we refer to Varadhan [108] . 

Next, we provide an explosion, non-explosion criterion for nonlinear Hawkes 
processes. The proof is based on a well known result for the explosion, non- 
explosion criterion for a class of point processes which can be found in the book 
by Kallenberg [ST]. 

Theorem 23 (Explosion, Non-Explosion Criterion). Assume that A(-) is increas- 
ing and that h(-) is integrable and decreasing, then there is explosion if and only 
if 

oo _. 

(5.73) £ < oo. 
Proof. Observe that, for any T > 0, 

(5.74) F hiT) {r < T) < P(r < T) < P h(0) (r < T), 

where P h (°) denotes the probability measure for the point process such that initially 
the rate function is A(0) and after nth jumps, the rate function becomes X(nh(0)); 
f> h ( T ) j s defined similarly. It is well known that the point process with intensity 
X(N t _) is explosive if and only if 



( 5 - 75 ) EttV 



X(n) 

71=0 V ' 



For the details and proof of the above result, we refer to Kallenberg [61]. But it is 

fr < oo if and only if V°° n t-^ 

(n) J {—tn=\) A(cn) 
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clear under our assumptions that Y^Lo xh) < °° if an d only if X]^=o XTcnT < °° 



where c > is any positive constant. Therefore, there is explosion if and only if 



oo 1 

(5.76) £ < oo. 



D 



Evaluating the exact probability distribution of the explosion time r, i.e. P(r < 
t), is hard and almost impossible. Nevertheless, one can still study its asymptotic 
behavior, i.e. 

(i) P(t > t) for large time t; 

(ii) P(r < e) for small time e. 

In the rest of this section, we will use Proposition [5] to answer (i) and Proposi- 
tion [6l to answer (ii). 



Proposition 5. Under the assumptions of Theorem \2^ satisfying the explosion 
criterion, we have 



(5.77) lim -logP (r >t) = inf -logP (r >t) = -a, 

t->oo t t>0 t 



where < o < oo. 

Proof. For a nonlinear Hawkes process with empty history, i.e. N(—oo, 0] = 0, we 
have 

(5.78) P (r >t + s)= P (r > t + s\r > s)P (r > s) < P (r > t)P (r > s). 
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Therefore, logP (r > t) is sub-additive and we know that 



(5.79) lim - logP (r >t) = inf - logP (r >t) = -a 

t^oo t t>0 t 



exists. And we also know that < o < oo. For example, it is easy to see that 
a < A(0). That is because P (r > t) > P (iV[O,£] = 0) = e - x ^ 1 . To see that 
o > 0, choose M large enough so that P(r > M) < 1 and then a > — jj logP(r > 
M) > 0. D 

Remark 14. Indeed, in the Markovian case, we can say something more about a 
defined in Proposition\h\ When h{t) = ae~ bt , Z t = '^2 T<t ae ~ b( ' t ~ T ' 1 is Markovian 
and by noticing that 



(5.80) 



f r l Ae f 

explf(Z t )-f(Z )~ j ^ r {Z s )ds 



is a martingale and that N t explodes if and only if Z t explodes, we have 



(5.81) lim - logP (r >t) = -a, 

t— >oo t 



where a is the principal eigenvalue for 

(5.82) Au = -au, u > 1. 

Note that here you have to choose the test function u > 1 rather than u > 0. 
Proposition 6. Assume that \(z) = r yz k + 5, where 7, 5 > and k > 1. According 



to Theorem 23, it is in the explosive regime. We have the following asymptotics 
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for small time e. 

(5.83) lime^ilogPfr < e) =C^ =l (k~'^ - k~^), 



1 - , „. t^t 1 k 



e->0 



where C k = J °° log [ 7 %^t +1 ) dy. 



Before we proceed, let us first quote de Bruijn's Tauberian theorem from the 
book by Bingham, Goldie and Teugels [§], which will be used in the proof of 
Proposition [6j 

Theorem 24 (de Bruijn's Tauberian theorem). Let fi be a measure on (0, oo) 
whose Laplace transform M(A) := f£° e~ Xx dfi(x) converges for all A > 0. Ifa<0, 
(/) E TZ a (0+), i.e. (j){\t) I '<p{t) ~ \ a as t ~ 0+, put ip(\) := 0(A)/A E ft Q _i(0+) ; 
then, for B > 0, 

(5.84) .log^Cx]^^-^— , x^0+, 

<j){l/x) 

if and only if 

(5.85) -logM(A)~(l-a)^y _1 ^-, A ^ oo. 

Here, 0(A) := sup{£ : <p(t) > A} and similarly for ip. 

Proof of Proposition [6| First, let us observe that since we are considering the event 
{r < e} for e > very small. It is sufficient to consider the point process with 
intensity X(h(0)N t _) at time t. 
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To apply de Bruijin's Tauberian theorem, notice that 



< 5 -»> -""^-MS 



i=0 



Recall that A (2) = r yz k + <5, where 7, <5 > and k > 1. Then, 

7i fc /i(0) fc + <5 



oc 



(5.87) -logM W = -^log V7 . l/i(0)i + i + (r 

>-'fiogf ?!fft { U 

00 / ^,^-„.kh(r\\k 



J 1/o i/k & \ 1 ay k h(0) k + 5 + a' ' 



(T l log U fc M0) fc + i My ' asa " >:x: 



Similarly, 

(5,8) -logM W <-flog(^gg±^)dx 

7o VTO*h(0)* + V 

Now let C k = J™ log ( ^(U°'+i ) d V> *W = ''"*> *<*) = r * and <» = 1 " * < 0- 

Then 0(l/e) = (l/e) _s= i and ^(cr) = a~k . To apply the theorem, we need to solve 
B such that 

(5.89) (1 _ a) (^p =fc (JL) T =C ,, 
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for B = (k- l)(C fc /jfc)*=i. Therefore, 



(5.90) lime^logP(r < e) = C?- 1 ^ - *^ - k~^' 

e—>0 
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D 



Chapter 6 



Limit Theorems for Marked 



Hawkes Processes 



6.1 Introduction and Main Results 

6.1.1 Introduction 

We consider in this chapter a linear Hawkes process with random marks. Let 
N t be a simple point process. N t denotes the number of points in the interval [0, t). 
Let J- t be the natural filtration up to time t. We assume that JV(— oo, 0] = 0. At 
time t, the point process has JVpredictable intensity 

(6.1) \ t :=v + Z t , Z t :=^2h(t-Ti,ai), 

where v > 0, the (tj)j>i are arrival times of the points, and the (oj)j>i are i.i.d. 
random marks, Oj being independent of previous arrival times Tj, j < i. Let 
us assume that a^ has a common distribution q(da) on a metric space X. Here, 
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h(. 



i + xX4 IR + is integrable, i.e. L j x h(t,a)q(da)dt < oo. Let H 



J h(t, a)dt for any a G X. We also assume that 



(6.2) 



H{a)q(da) < 1. 



Let F q denote the probability measure for the a^s with the common law q(da). 



Under assumption (6.2), it is well known that there exists a unique stationary 



version of the linear marked Hawkes process satisfying the dynamics (6.1) and 
that by ergodic theorem, a law of large numbers holds, 



(6.3) 



lim — 

t— >OD t 



W[H(a)\ 



This chapter is organized as follows. In Section |6.1.2[ we will introduce the main 
results of this paper, i.e. the central limit theorem and the large deviation principle 
for linear marked Hawkes processes. The proof of the central limit theorem will be 



given in Section 6.2 and the proof of the large deviation principle will be given in 



Section 6.3 Finally, we will discuss an application of our results to a risk model 



in finance in Section 16.4 



6.1.2 Main Results 



For a linear marked Hawkes process satisfying the dynamics (6.1), we have the 
following large deviation principle. 



Theorem 25 (Large Deviation Principle). Assume the conditions (6.2) and 



(6.4) 



lim ( f e H{a)x q(da)-x\ = oo. 
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Then, {N t /t G •) satisfies a large deviation principle with rate function, 



A(x) := 



inf^ < xE^[H(a)] + v — x + a; log 



-oo 



> 

xE<i[H(a)]+i/j 


+ xE« 


'log| 


| x > 
x < 



#*x — v(x* — 1) x > 

+oo x < 

where the infimum of q is taken over A4(H) , the space of probability measures on X 
such that q is absolutely continuous w.r.t. q. Here, Q* and x* satisfy the following 
equations 



(6.5) 



x, = Ei [ e *.+(*.-iM«0] 



x 



+(z»-l)tf(a)-| 



Theorem 26 (Central Limit Theorem). Assume limt^^ t 1 ^ 2 J^°E q [h(s,a)]ds = 
and that ( 6.2[ ) /io/ds. Tnen, 



(6.6) 



AT,- 



i/f 



' - t-.- //.:„,] ^ N ( Q K1+ Vdr*[#(a)]) 



V* 



' (i-E«[ir(o)])' 



m distribution as t — ^ oo. 
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6.2 Proof of Central Limit Theorem 



Proof of Theorem 26 First, let us observe that 



(6.7) 



/ X s ds = vt + \^ / h(s — Ti, ai)ds 






where the error term E t is given by 



(6.8) 
Therefore, 



/•oo 

£t ■= ^ / h(s - Ti, ajds. 



(6.9) 



N t - J X s ds 
7t 



Nt-vt-Y JTl<t H{a l ) | St 
y/i y/i 

N t - at ^ q [H (a)]N t - Yl <rt H ( a i 
(l-E q [H(a)]) l / + 



Vt 



v^ 






where /x := jT^tW^jt- Rearranging the terms in (6.9), we get 
(6.10) 



N t -fit 



y/i l-E«[#(o)] 



N t - Sq Xsds | E T< <t(g(ai)-E q [g(a)]) g* 
V* y/i y/i 



It is easy to check that ■% — >■ in probability as t — > oo. To see this, first 
notice that E[At] < 1 _^/[ H i a \-\ uniformly in t. Let g(t,a) := J^ 00 h(s,a)ds. We have 



157 



£t = J2 T -<t 9{t ~ T ii a i) ancl thus 



(6.11) E[£t]= g(t- s,a)q(da)E[\ s }ds 

Jo Jx 



V 



- l-E«[/r(a)] 7 I ^-s,a) q (da)ds 



v 



n , E q \g(s,a)}ds. 



Hence, by L'Hopital's rule, 

1 /"^„„^_i; E%(M)] 



(6.12) lim — / E«[#(s,a)]ds = lim 



If-1/2 
2 1, 



/•oo 

= lim2t 1/2 / E q [h(s,a)]ds = 0. 

t^oo J t 

Hence, ■% — > in probability as £ — >■ oo. 

Furthermore, M x (t) := JV t - J* X s ds and M 2 (t) := J2 T . <t (H(ai)-E^[H(a)]) are 
both martingales. 

Moreover, since L X s ds is of finite variation, the quadratic variation of Mi(t) + 
M 2 (£) is the same as the quadratic variation of N t + M 2 (£). And notice that 
N t + M 2 (t) = J2 T <t(l + H(a,j) — E q [H (a)]) which has quadratic variation 

(6.13) £(l + if(aO-E9[tf(a)]) 2 . 
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By the standard law of large numbers, we have 

(6.14) 

\ £(1 + H(a t ) - E*[H(a)]) = ^ • ^ £(1 + # (a,) - E«[iJ(a)]) 2 

^T^Wa)Y W[(l + H(a) - m ' {H{am 

z/(l + Var 9 [ff(a)]) 

~ l-E*[H(a)] ' 
a.s. as t — > oo. By a standard martingale central limit theorem, we conclude that 
(6.15) N '-T^fim ^ N ( Kl + Var<[tf („)]) 



Vt V (1 - E«[^(o)]) 3 

in distribution as t — ¥ oo. □ 

6.3 Proof of Large Deviation Principle 

6.3.1 Limit of a Logarithmic Moment Generating Function 

In this subsection, we prove the existence of the limit of the logarithmic moment 
generating function lim^oo | logE[e 0Ar *] and give a variational formula and a more 
explicit formula for this limit. 

Theorem 27. The limit T(9) of the logarithmic moment generating function is 



1 \v(f(6)-l) if9e(-oo,9 c ] 

>.16) r(^)=lim-logE[e e7Vt ] ' 

t— >OD t 

-oo otherwise 



159 



where f(9) is the minimal solution to x = J x e e+H ^ a ^ x ^q(da) and 



5.17) 



9 C - -log / H(a)e H(a)iXc - 1) q(da) > 0, 
Jx 



where x c > 1 satisfies the equation x J x H(a)e H ^ a ^ x l " > q{da) = f x e H( * a ^ x ^q(da). 



We will break the proof of Theorem 27 into the proof of the lower bound, i.e 



Lemma 30 and the proof of the upper bound, i.e. Lemma 31 



Before we proceed, let us first prove Lemma 29 , which will be repeatedly used. 



Lemma 29. Consider a linear marked Hawkes process with intensity 



(6.18) 



A t := a + (3Z t := a + (3 2_, h(t - T t . 



n<t 



and (3E, q [H(a)] < 1, where the a,i are i.i.d. random marks with the common law 
q(da) independent of the previous arrival times, then there exists a unique invariant 
measure tt for Z t such that 



(6.19) 



X(z)7f(dz) 



a 



l-/3W[H(a)} 



Proof. The ergodicity of Z t is well known. Let it be the invariant probability 
measure for Z t . Then 



(6.20) 



X(z)ir(dz) = a + (3 / / h(t,a)dtq(da) \ X(z)7c(dz). 
Jx Jo 



n 
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Lemma 30 (Lower Bound). 



(6.21) 



! | *(/(*) -1) */0e(-ooA] 

liminf-logE[e 9JVt ] > 

-oo otherwise 



where f(9) is the minimal solution to x = J Q 6 + H \ a )( x 1 ^q(da) and 9 C is defined 



in 



(6.17). 



Proof. The intensity at time t is At := X(Z t ) where X(z) = v + z and Z t = 
^2 r . <t h(t — Ti,Oi). We tilt A to A and q to q such that by Girsanov formula the 
tilted probability measure P is given by 
(6.22) 



<>xi> < i\KZs) ~ X(Z s ))ds + f log ( ^|4 ) + log (^ ) <!X. 



Ti 



KZs 



dq 



Let Q e be the set of (A, q, tt) such that the marked Hawkes process with intensity 
X(Z t ) and random marks distributed as q is ergodic with n as the invariant measure 
oiZ t . 

By the ergodic theorem and Jensen's inequality, for any (A, q, ft) G Q e , 



(6.23) liminf-logE[e 

t— >oo t 



6Nf\ 



> liminf E 

t— »oc 



i i r l 

-0iV t - T / (A - A)rfs - - 
1 J Jo I JO 



log(A/A) + log(rfg/dg) 



Ads 



0A7r(ck)+ (X-X)f{dz)- // log(A/A) + \og{dq/dq) Xqn(dz). 
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Hence, 



(6.24) 
liminf - logEle 

t^oo t 



> sup 

(A,q,#)eQ 

> sup 



BN t ] 



9\n + / (A - A)tt 



log(A/A) + \og{dq/dq) 



Xqn } . 



(9 - E«[log(dq/dq)]) A + A - A - A log (A/A 



7T 



> sup 

0<K<E9[fl"(a)]- 1 ,(K\,q,^)ei 



(9 - E%g(dq/dq)]) + 1 - - - logK 



sup sup 

q 0<K<Ei[H{a) 



{9 - E«[\og(dq/dq)}) + 1 - - - logK 



Att 
Kv 



l-KE«[H(a)y 



where the last equality is obtained by applying Lemma 29 The supremum of q is 
taken over .M(X) such that q is absolutely continuous w.r.t. q. Optimizing over 
K > 0, we get 



(6.25) 



liminf - logE[e 

t— ¥00 t 



0Nf\ 



> 



sup$ i/(/(0) - 1) H9e(-oo,E« 



log J 



+ E*[H(a)] - 1 - logE*[#(a)]] 



-oo 



otherwise 



where f(9) is the minimal solution to the equation 



(6.26) 



x = e 



6+Ei[log(dq/dq)]+Ei[H(a)](x-l) 



<E q 



J+H(a)(x-1)^Q_ 



o e+H(a)(x-l) 



q(da) 
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The last inequality is satisfied by Jensen's inequality; the equality holds if and only 

if 



dq e H{a){x-i) 

(6.27) 



dq E«[e if (°)( a; - 1 ) 
Optimizing over q, we get 



i \u{f{9)-l) H9e(-oo,9 c 

(6.28) liminf-logE[e e7Vt ] > 

t— >oo t 

-oo otherwise, 



where 9 r is some critical value to be determined. Let 



(6.29) G{x) =e e e H{a){x - 1] q{da) - x. 

If 9 = 0, then G(x) = J ' e H ^ x -^ q(da) - x satisfies G(l) = 0, G(oo) = oo (by 



(6.4)) and G'(l) = E q [H(a)} - 1 < which implies min x >i G(x) < 0. Hence, there 
exists some critical 8 C > such that min x> i G(x) = 0. The critical values x c and 
9 C satisfy G(x c ) = G'(x c ) = 0, which implies 

(6.30) 9 C = - log I H(a)e H{a){Xc - 1) q(da), 

where x c > 1 satisfies the equation x J H(a)e H ^ a ^ x ~^q(da) = J e H ( a '( x ~ l ^q(da). 

It is easy to check that indeed, for dq* = ^rn^^-^ dq, 
(6.31) 



E 



</, 



, dq* 

log-i- 

dq 



+ E q *[H(a)] -1 -log E 9 * [H(a)] ) = - log f H(a)e H{aKx *- 1) q(da). 



□ 
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Lemma 31 (Upper Bound). 



i L/(/(0)-l) if 9 e (-oo, 9 C 

(6.32) limsup-logE[e ejVi ] < 

t— >oo t , 

-oo otherwise 



where f(9) is the minimal solution to x = J Q 6 + H \ a )( x l ">q(da) and 9 C is defined 



in 



(6.17). 



Proof. It is well known that a linear Hawkes process has an immigration-birth 
representation. The immigrants (roots) arrive via a standard Poisson process 
with constant intensity v > 0. Each immigrant generates children according to 
a Galton- Watson tree. (See for example Hawkes and Oakes [51] and Karabash 
|63j.) Consider a random, rooted tree (with root, i.e. immigrant, at time 0) as- 
sociated to the Hawkes process via the Galton- Watson interpretation. Note the 
root is unmarked at the start of the process so the marking goes into the expec- 
tation calculation later. Let K be the number of children of the root node, and 
let S\ , Si ; , . . . , St be the number of descendants of root's k-th. child that were 
born before time t (including k-th child if an only if it was born before time t). 
Let St be the total number of children in tree before time t including root node. 
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Then 



(6.33) 



F s (t) := E[exp(dS t )] 

oo 

= ^E[exp(0S t )|K = k]F(K = k) 



k=0 



exp(#) J2 F ( K = k ) I1 E f exp i 9S ' 



7 (i) 



cxp( 



fc=0 


1=1 


oo , 


EEfexpK)! 


L \ /J 
fc=0 


oo „ 

52 / 


7 /"Ms, a) 


fc=0^ X 


_U H(o) 



F 5 (*-s)cteJ e^ (a) 



#(a) 



A-! 



q(da) 



exp [6+1 h(s,a)(Fg(t — s) — l)ds ) q(da). 



Now observe that -F,s(£) is strictly increasing and hence must approach to the 
smaller solution x* of the following equation 



(6.34) 



x — I exp [9 + H(a)(x — 1)] q(da). 



Finally, since random roots arrive according to a Poisson process with constant 
intensity v > 0, we have 



(6.35) F N (t) : = E[exp(0iV t )] = exp 



v [ (F s (t -s)- l)ds 
. Jo 



But since Fs(s) 1 1» as s -)■ oo we obtain the main result 



(6.36) 



l -\ogF N {t) = u- t 



(F s (s)-l)ds 



t— >oo 



v(x* - 1), 



which proves the desired formula. Note that x* = oo when there is no solution to 
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(6.34). The proof is complete. 



□ 



6.3.2 Large Deviation Principle 



In this section, we prove the main result, i.e. Theorem 25 by using the Gartner- 



Ellis theorem for the upper bound and tilting method for the lower bound. 



Proof of Theorem 25 For the upper bound, since we have Theorem 27 



we can 



simply apply Gartner-Ellis theorem. To prove the lower bound, it suffices to show 
that for any x > 0, e > 0, we have 



(6.37) 



liminf-logP( — e B e (x) ) > -swp{0x- T(9)\, 

t^oo t \ t 



where B e (x) denotes the open ball centered at x with radius e. Let P denote the 
tilted probability measure with rate A and marks distributed by q(da) as defined 



in Lemma 30 By Jensen's inequality, 



(6.38) ilogpfy eB«(z; 



>ilog/ *rfP 

t J^eB e (x) dF 

= ilo g p3eS e (x; 



log 



1 



dF 



>Ilo g p3 e s e (aO 



P (f E B e (x)) y^ 6 B.(x) dF 

1 1 



P (f e B e {x)) t 



■E 



dF 



By the ergodic theorem, 



(6.39) 



liminf-logP( — G B e (x) 

i->oc t \ t 



>- inf n(X,q,7t) 

0<K<&[H(a)]- 1 
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where Q x e is defined by 



5.40) 



Ql =\(X,q,Ji)eQ e : / \(z)n(dz) = x 



and the relative entropy % is 



(6.41) U{\ q,n)= / (A - A)tt + / log(A/A)A7T + / / log(dq/dq)q\JT 



By Lemma 29 



(6.42) 



inf 



H(\,q, 
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0<^<E,p(a)]-i,- i _ KE ^ (a)11 (XA A i) e Q e 










inf \^--l + \ogK + W 

K= . * - ,(K\,q,ir)£Q e 1 K 


" dq 

logr 
dq 


}./- 


— inf }WQ\H(n\] \ V 1 1 W 1 X 


J + E« 


logj 
dq 


U 


"Tl [ ( )J ' x l ' l0g \xW[H (a)] + i/ 


J 


— inf ' r rV^\TJ(nW \ u t 1 t Inf 1 1 I r 


cE 4 


' dq 

logT 
dq 


7 I [ l )l ' ' g U^[#(a)] 4 


■ z/y 



Next, let us find a more explict form for the Legendre-Fenchel transform of T(#). 



(6.43) 



sup{0x - r(0)} = sup{fe - i/(/(0) - 1)}, 



where /(0) = E9[e e+(/W ~ 1)if(a) ]- Here, 



(6.44) 



/'(0) = E 9 [(1 + f{9)H{a))e e+{m ~ 1)H{a) ] 



So the optimal 0* for (6.43) would satisfy /'(#*) = - and 0* and x* = /(0*) satisfy 
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the following equations 



(6.45) 



•■'■■ :k 



E <? r e e.+(x.-i)H(o)i 



and sup ee]R {0:c — T(9)} = 9*x — v(x* — 1). 



On the other hand, letting dq* 



e (x t -l)H(a) 



we have 



(6.46) 



E q *[H{a)} 



Eq |" e e.+(x.-i)fr(a)] x j, 

E" [e^.-i)»W] ~~ ^ ~~ x ! 



and E 9 *[log f^] = (x* - l)E«'[iJ(a)] - logE^e^*- 1 ^^], which imply 



(6.47) liminf-logP ( — 6 B e (x] 

t^oo t \ t 



> - inf ^ xE q \H (a)] + v - x + x log . _ , 

q \ [ v n yxEi[H(a)} + u^ 



x 



> - < xE q * [H(a)] + v - x + x log 



;r 



^dE>[if(a)] + i/ 
6*x — //(re* — 1) = sup{#:r — r(#)}. 



+ xE q 


' dq 

logj" 
dq 


xE q * 


dq 



a 
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6.4 Risk Model with Marked Hawkes Claims Ar- 
rivals 

We consider the following risk model for the surplus process R t of an insurance 
portfolio, 

N t 

(6.48) Rt = u + pt-^2d, 

where u > is the initial reserve, p > is the constant premium and the Cj's 
are i.i.d. positive random variables with the common distribution fi(dC). C- h 
represents the claim size at the ith arrival time, these being independent of N t , a 
marked Hawkes process. 
For u > 0, let 

(6.49) t u = inf{t > : R t < 0}, 

and denote the infinite and finite horizon ruin probabilities by 

(6.50) tj){u) =P(t u < oo), ip{u,uz) =F(t u <uz), u,z>0. 
By the law of large numbers, 

(6-51) limlfc,- Z l ?Z,- 

K J i^oot^ l-W[H(a)] 
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Therefore, to exclude the trivial case, we need to assume that 



(6.52) 



W\C]v u(x c ) - 1 

< P < 



l-Ei[H(a)] 



9r 



where the critical values 9 C and x c = f(9 c ) satisfy 



(6.53) 






JM+ JX 



a C+H(a)(x a -l) 



q(da)fi(dC) 



1 = /*■ J x H(a)e H ^-^ c q(daMdC) 



Let us first assume that the claim sizes following light tails, i.e. there exists 
some 6 > such that f R+ e ec (i(dC) < oo. 



Following the proofs of large deviation results in Section 6.3, we have 



(6.54) r c (0) := lim -logE 

t— >oo t 



A , -, ,i/(x-l) if B 6 (-oo,0 c 
-oo otherwise 



where x is the minimal solution to the equation 



(6.55) 



x 



,0C+(x-l)H(a) 



q(da)fi(dC). 



Before we proceed, let us quote a result from Glynn and Whitt jl3] , which will 



be used in our proof Theorem 29 



Theorem 28 (Glynn and Whitt [43j ) . Let S n be random variables. t u = inf{n : 
S n > u] and ip(u) = P(t u < oo). Assume that there exist 7, e > such that 

(i) K n (9) = \ogK[e 9Sn } is well defined and finite for 7 — e < 9 < ^ + e. 

(ii) limsup n ^ OC) E[e 6 ' (s '' I_s '' l - l) ] < 00 for -e < 9 < e. 
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(Hi) k(8) = lim^^oo -n n (9) exists and is finite for 7 — e < # < 7 + e. 
(iv) ^(7) = and n is differentiable at 7 with < ft' (7) < 00. 
Then, lim^-^ Mog^(w) = ~1- 

Remark 15. We claim that Tc(0) = p9 has a unique positive solution 9^ < 9 C . 
Let G(9) = Tc{0) — p9. Notice that G(0) = 0, G(oo) = oo ; and that G is convex. 
We also have G'(0) = 1 __ Eq \ I j( a )i — p < and Tc(9 c ) — p9 c > since we assume that 
p < K/( .c.)- ) ^ Therefore, there exists only one solution 9^ E (0, 9 C ) ofTc(9 Ji ) = p9^ . 



Theorem 29 (Infinite Horizon). Assume all the assumptions in Theorem 25 and 
in addition \Q.h2\ , we have Hindoo Mog^fw) = — 9\ where 9^ E (0,6* c ) is the 
unique positive solution ofTc{9) = p9. 

Proof. Take S t = Y^li C i ~ P t and K t(9) = logE[e e5t ]. Then lim^oo \n t {9) = 
^c{0)-p9. Consider {S nh } nm . We have lim^^ \n nh (9) = hT c {9)-hp9. Check- 



ing the conditions in Theorem 28 and applying it, we get 



(6.56) lim — logP I sup S n h > u 

«->oo U \nef>i 



Finally, notice that 



(6.57) sup S t > sup S n h > sup S t — ph. 

teR+ neN teR+ 

Hence, lim^oo MogV'(w) = -9 ] . □ 



Theorem 30 (Finite Horizon). Under the same assumptions as in Theorem 29 
we have 



(6.58) lim — log^(u, uz) = —w(z), for any z > 0. 

u— >oo 11 
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Here 



(6.59) w(z) 

(V if z> rjgv)_ p 

Ac(x) = sup g&R {9x — Tc{9)} and 9^ G (0, 9 C ) is the unique positive solution of 
Tc($) = P@> as before. 

Proof. The proof is similar to that in Stabile and Torrisi |1U2] and we omit it 
here. □ 

Next, we are interested to study the case when the claim sizes have heavy tails, 
i.e. f R+ e ec fi(dC) = +oo for any 9 > 0. 

A distribution function B is subexponential, i.e. B e S if 

(6.60) .im P ^±^) =2 , 

V ^ X^OO P(Cl > X) 

where C\, C^ are i.i.d. random variables with distribution function B. Let us 
denote B{x) := P(Ci > x) and let us assume that E[Ci] < oo and define B (x) : = 
jgj^T J Q X B(y)dy, where F(x) = 1 — F(x) is the complement of any distribution 
function F(x). 

Goldie and Resnick jH] showed that if B e S and satisfies some smoothness 
conditions, then B belongs to the maximum domain of attraction of either the 
Frechet distribution or the Gumbel distribution. In the former case, B is regularly 
varying, i.e. B(x) = L(x)/x a+1 , for some a > and we write it as B e lZ(—a — 1), 
a > 0. 

We assume that B <E S and either B e 7Z(—a— 1) or B E Q, i.e. the maximum 
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domain of attraction of Gumbel distribution. Q includes Weibull and lognormal 
distributions. 

When the arrival process N t satisfies a large deviation result, the probability 
that it deviates away from its mean is exponentially small, which is dominated 
by subexonential distributions. By using the techniques for the asymptotics of 
ruin probabilities for risk processes with non-stationary, non-renewal arrivals and 
subexponential claims from Zhu [117] , we have the following infinite- horizon and 
finite-horizon ruin probability estimates when the claim sizes are subexponential. 



Theorem 31. Assume the net profit condition p > E,[Ci] 1 _ Eq l ! H , , 
(i) (Infinite- Horizon) 



(6.61) 



lim 



ip(u) i/E[Ci] 

B (u) " p(l - ® q [H(a)]) - vE[d 



(ii) (Finite-Horizon) For any T > 0, 



(6.62) 



lim ^ UZ) 

M->00 Bq(u) 



v¥.[C{\ 

p(l-E9[#(a)])-i/E[Ci] 

uE[C\] 

p(l-W[H(a)])-uTS.[C l ] 



1-1 



pil-EilHjam-vEld]} T 



p(l-Ei[H(a)}) I a 



p(l-E9[H(a)])- 1 /E[C 1 ] T1 
1 — e p(l-W[H(a)]) 



if B G TZ(-a - 1) 

if Beg 



6.5 Examples with Explicit Formulas 

In this section, we discuss two examples where an explicit formula exists. 
Example [l] is about the exponential asymptotics of the infinite-horizon ruin 
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probability when H(a) and the claim size C are exponentially distributed. Example 
[2] gives an explicit expression for the rate function of the large deviation principle 
when H(a) is exponentially distributed. 

Example 1. Recall that x is the minimal solution of 

(6.63) x = [ [ e 0C+{x ~ 1)H{a) q(da)fi(dC). 

Jr+ Jx 

Now, assume that H(a) is exponentially distributed with parameter A > ; then, 
we have 



(6.64) x = W[e ec ] X 



A-(x-i)' 
which implies that 

(6.65) x = X - J A + 1 - y/(X + l) 2 - 4AE^[e 9C ]} . 

Now, assume that C is exponentially distributed with parameter 7 > 0. Then, 



(6.66) x= l (a + 1- ,/(A + l) 2 -4A ' 



7-0 



The infinite horizon probability satisfies lim n _ s>00 -log ?/>(«) = —9\ where 6^ satis- 
fies 



(6.67) p0t =v[ - |a + 1-,/(A + 1) 2 -4A- ' 



2 1 V l~ e] 



174 



which implies 



^ , , x_ /m , ,, 2 4A 7 



(6.68) -*— + i-\=-j(\ + iy 



and thus 



(6.69) 4W + -(1-A) = A V - "^ 



z/ 2 ' v 



7 -0t 7 -0f 



<5mce we are looking for positive 6\ we get the quadratic equation, 
(6.70) p 2 (0 f ) 2 - (p 2 7 - pz/(l - A))0 f - (pz/ 7 (l - A) + Az/ 2 ) = 0. 

.Since p > 1 _ Eg U^ 1 = -zfzrn; we have pvy(\ — A) + Az/ 2 > 0. Therefore, 



. (p 2 7 - Ml - A)) + yV 7 - Hj ~ A)) 2 + 4p 2 (p^ 7 (l ~ A) + Az/ 2 ) 
(6.71) 6>t = — . 

Example 2. Now, letH(a) be exponentially distributed with parameter X > 0. We 
want an explicit expression for the rate function of the large deviation principle for 
(N t /t e •)■ Notice that, 



(j{A + l-v/(A + l) 2 -4A e 4-l) for6<\og(^ 
(6.72) r(0) - > V l J ; V 



-oo otherwise 



To get I(x) = sup e£R {9x — T(6)}, we optimize over 9 and consider x = T'(6). 
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Evidently, 



(6.73) 



x + -is(-4\)e = 

2 2y/{\ + l) 2 - 4Ae e 



which gives us 



(6.74) 



# = lo£ 



-2a; 2 + aV4a; 2 + z/ 2 (A + l) 2, 



Az/ 2 



whence, 
(6.75) 


a; log 1 






-) 






-2x 2 +Xyf 4x 2 +u 2 (\+l) 2 \ 
J 

-i/(§{a + 1- 




/(a:) = < 


-2s+-y/4x 2 +i/ 2 (A+l) 2 | 
J 


i/x > 

otherwise 
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Appendix A 



Proof of Theorem 3 



Since a Hawkes process has a long memory and is in general non-Markovian, 
there is no good criterion in the literature for moderate deviations that we can use 
directly. For example, Bacry et al. [2] used a central limit theorem for martingales 
to obtain a central limit theorem for linear Hawkes processes. But there is no 
criterion for moderate deviations for martingales that can fit into the context of 
Hawkes processes. Our strategy relies on the fact that for linear Hawkes processes 
there is a nice immigration-birth representation from which we can obtain a semi- 



explicit formula for the moment generating function of N t in Lemma 32 A careful 



asymptotic analysis of this formula would lead to the proof of Theorem [3j 
Proof of Theorem [3| Let us first prove that for any SgR, 



(A.l) lim— -^logE 



t— >OD 



a{t) 



' e *®.9{N t -&) 



v6 2 



2(1 — H/ill^O 3 ' 



177 



By Lemma 32 for fixed 9 G K and t sufficiently large, we have 



(A.2) 



E 



°(*) 
e * 



6w t 



vj*Gt{s)ds 



where G t (s) = e^^o Hu)G t (s-u)du - 1, < s < t. Here, G t (s) is simply the 
F(s) — 1 in Lemma 32. Because ^# depends on t, we write G t (s) instead of 



G(s) to indicate its dependence on t. Clearly, G t {s) is increasing in s and G t (oc) 



is 



a(t), 



the minimal solution to the equation x t = e * d +\\ h \\ L ^ x t — 1. (See the proof 



of Lemma 32 and the reference therein.) Since \\h\\i,i < 1, it is easy to see that 



x t = 0(a(t)/t). Since x t = 0(a(t)/t), we have G t (s) = 0(a(t)/t) uniformly in s. 
By Taylor's expansion, 



(A.3) G t (s) 



a(t)9 



+ / h(u)G t (s — u)du 



+ 



i (a(t)ey 1 

2 V t ) 2 

a{t)6 



+ 



t 



h(u)G t (s — u)du 



h(u)G t (s - u)du + O {(a(t)/t) 3 ) 



Let G t (s) = a ^G x {s) + i^f) 2 G 2 {s) +e t (s), where 



(A.4) 



Gi(s) := 1 + / h(u)Gi(s — u)du, 
Jo 



and 



(A.5) G 2 (s) := f %)G 2 (s - u)du + 9 - + ^(G^s) - 1) + ^(G^s) - l) 2 . 
Jo 2 2 



Substituting (A.4) and (A.5) back into (A.3) and using the fact G t (s) = 0(a(t)/t) 
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uniformly in s, we get e t (s) = 0((a(t)/t) 3 ) uniformly in s. Moreover, we claim 
that 



(A.6) 

(A.7) 



i r /•* i 

lim —— Qv I G\(s)ds — Qui = 
*-*» a(t) [ Jo J 

1 /"* 6* 2 

lim - / G2(s)ds = — ——, — -7. 



To prove (A.6), notice first that 



(A. 8) - / G l (s)ds = l + - J J h{u)G x {s - ujduds 



1 + 



jo 

t r t 







h{u) I G\(s — u)dsdu 



-1 rt pt—u 

1 -\ — / h{u) I G\(s)dsdu 
t Jo Jo 

I ft ft I ft ft 

1 -i - / h(u) I Gi{s)dsdu / h(u) / G\{s)dsdu. 



t 







t 



t-u 



Therefore, 

(A.9) 



1 /"* ( w l-Uoh^)JL u Gi(s)dsdu 
&i(s)ds = — 



t 



1 — f Q h{u)du 



Hence, 



(A. 10) 



a(t) 



Qv f G^ds-Qfit 
Jo 



Qv 
a(t) Jo 
Qvt 



a(t) 



G 1 (s) 
1 



1- \\h\ 



ds 



1 - jl h(u)du 1 - / °° h(u)du 



Qv j h{u) f t _ u Gi(s)dsdu 
a{t) l - /' h{u)du 
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For the first term in (A.10), we have 



(a.ii; 



Out 



a(t) 



1 - f* h{u)du 1 - J °° h{u)du 



\6\ut f°° h(u)du 
" a(t)(l-|NU0 2 



as t — > oo, since by our assumption, sup i>0 £ 3//2 /i(£) < C < oo, which implies that 

W) ir h ( u ) du < w) ir ^ du = ^f -> ° as t -*■ °°- 



For the second term in (A.10), we have 
(A.12) 



lim sup 

t— >oo 



Q v J h(u) j t _ u Gi(s)dsdu 



a(t) l-f*h(u)d 



■u 



< lim Gi(t) lim sup ' ' ; Jo \! „ = 0, 



i— >oo 



t— >oo 



a(t) l-\\h\l 



This is because (A. 4) is a renewal equation and \\h\\i,i < 1. By the application of 



the Tauberian theorem to the renewal equation, (see Chapters XIII and XIV of 
Feller [38]), lim^oo Gi(£) = jzikji — • Moreover, our assumptions sup f>0 t 3,/2 /i(t) < 
C < oo and \\h\\ L i < oo imply that 



(A.13) 



it) I Ku)udu ~W)l Ku)udu + W)L^ 



du — > 0, 



as t — y oo. 



To prove ( A.7), notice that lim^oo G\(t) = jzm\ — an d again by the application 
of the Tauberian theorem to the renewal equation, (see Chapters XIII and XIV of 
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Feller EH1), we have 



1 /"* 

(A.14) lim - / G 2 (s)ds = lim G 2 (t) 

Q2 1 + 2 f r - n i li l) + f r - n i T1 1 



1 — ll^llr.l 



2(l-ll/illnV 5 ' 



Finally, from (A.2) and the definitions of Gi(s), G^s) and e t (s), we have 



(A.15) 



a(t) : 



logE 



*p.0(N t -i*t) 



a(t)- jo 



v I G t {s)ds - On—— 
Jo a{t) 



a(t) 



a(t) 

v9 f G 1 (s)ds-9nt 

Jo 



-v I G 2 (s)ds + —- I e t {s)ds. 
t J a(t) 2 



Hence, by (A. 6), (A. 7) and the fact that e t (s) = 0((a(t)/t) 3 ) uniformly in s, we 



conclude that, for any 9 G 



(A. 16) 



lim ,,,„ log E 



t^oo 



a(tf 



' e S®.B(N t -vt) 



u9 2 



2(1 



3 



Applying the Gartner-Ellis theorem (see for example [30J), we conclude that, for 
any Borel set A, 



(A. 17) - inf J(x) < lim inf -^— log P ( — , ^ G A 

< lim sup — — log P ( * ~^ e A ) < -inf J(x) 
t^oo a(t) 2 \ a(t) J x€ a 
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where 



(A. 18) J(x) = sup < 9x - 



u6 2 ] x 2 (l - \\h\\ L i) 3 



2 1- hrl) 3 2u 



n 

Lemma 32. For Q < \\h\\ L i - 1 - log \\h\\ L i, 
(A.19) E[e m ] = e u ^ F( ~ s) - 1)ds , 

where F(s) = e e+ ^ Ku){F{s-u)-i)du j or any o< s <t. 

Proof. The Hawkes process has a very nice immigration-birth representation, see 
for example Hawkes and Oakes [54j . The immigrant arrives according to a homo- 
geneous Poisson process with constant rate v. Each immigrant produces a number 
of children, this being Poisson distributed with parameter ||/i||i,i. Conditional on 
the number of the children of an immigrant, the time that a child is born has 
probability density function ,,A ' . Each child produces children according to the 
same laws independent of other children. All the immigrants produce children 
independently. Let F{t) = E,[e es ^], where S(t) is the number of descendants an 
immigrant generates up to time t. Hence, we have 

(A.20) E [^] = JT ij ^e-«-±j- } [■■ [ F( tl ) • • • F(t k )d tl • • • dt k 

u-n K - l l K - J Jti<t 2 <-<t h 



k=0 

"/o(^(«)-l)* 



e 



By page 39 of Jagers [58], for all 9 e (-oo, \\h\\ L i - 1 - log \\h\\ L i], E[e 05 (°°)] is 
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the minimal positive solution of 



(A.21) 



E[e 9S(oo) } = e e exp {fi{E[e es ^] - 1)} . 



Let K be the number of children of an immigrant and let S t , S t , . . . , S^ be 
the number of descendants of immigrant's kth child that were born before time t 
(including the kth child if and only if it was born before time t). Then 



(A.22) 



fc=0 



F(t) = J^ E l edS{t) \ K = k ] F ( K = k ) 
=o 

°° b 



k=0 
oo 

E 

fc=0 



* h(s) 



U 1 



-F(t-s)ds) e~ lM ^ 



\h\\ k 
\ n \\L 1 

k\ 



o e+f*h(s)(F(t-s)-l)ds 



□ 
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Appendix B 



Proof of Theorem 18 



Let P n denote the probability measure under which N t follows the Hawkes 
process with exciting function h n = YH=\ a i^~ hlt such that h n — > h as n — > oo in 



both L 1 and L°° norms. We can find such a sequence h n by Lemma 35 Let us 
define 



(B.l) VJ9) = lim -logE p " \e eNt ] . 

t-s>oo t 



We have the following results. 

Lemma 33. For any K > and 61,62 G [— K, K], there exists some constant 
C(K) such that for any n, 

(b.2) |r n (0i)-r n (0 2 )|<c(#)|0i-0 2 |. 
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Proof. Without loss of generality, take 9 2 > 6\. Then 



(B.3) r n (0x) < T n {6 2 \ 

= sup / (0, - 0. )\n - { \- - H{\. k) 
(A,*)es* 



< sup f(e 2 -9 1 )\7t + r n (e 1 ) 



(A,*)eQ* 



where 



(B.4) 



Ql = |(A,7r) e Q e : fe 1 \7r - H(\,jt) > r n (0O - 1| . 



The key is to prove that sup^^g* J Xk < C(K) for some positive constant C(K) 
depending only on K. Define u = u(zi, . . . , z n ) = e^^ lCiZl where 



3K 1 



( B -5) (k = n ai ■-, l<i<n. 



Define V — — — such that 



3K n 
(B.6) V(z 1: ...,z n ) = =^-^r J> - K*i + ■■■ + Zn)(e 3K - 1). 

2-/i=l bi j=i 

Notice that J Afii = for any test function / with certain regularities. If we try 
/ = g, 1 < i < n, we get 



(B.7) / ZiTt + ^ / Att = 0, 1 < i < n. 
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Summing over 1 < % < n, we get 



(B-8) 



Att 



Z-^i=l h 



n 



Ziir. 



i=l 



Notice that Y17=i f 1 = ll^nlU 1 which is approximately H^Hl 1 when n is large. Since 



lim sup 2 



(B.9) 



X(z) 



and Y^i=\ z i — O5 we have 



9i J Att < K f Att = ^-^ J f^ *& < \ J Vit + C 1/2 (K), 



where C\/2{K) is some positive constant depending only on K. 

We claim that J V(z)n < H{tx) for any i\ e Q*. Let us prove it. By the ergodic 
theorem and Jensen's inequality, 
(B.10) 



V(z)n = lim E n 

t— ¥00 



V{Z s )ds 



< lim sup - log E" 



J* V(Z a )ds 



H[ 



TV 



Next, we will show that u > 1. That is equivalent to proving YH=\ f 1 — 0- Consider 
the process 



(B.ll) 



«=E^ = EE 



Ic-bitt-Tj) 



J2 9 ^~ T i^ 



i=i 



T,-<t 1=1 



T,-<t 



where #(£) = £? =1 ^e _6i *. Notice that #(£) = f t °° h(s)ds > 0. Therefore, Y t > 
almost surely and X^=i ~b"^ — 0- Since — + V = and w > 1, by Feynman-Kac 
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formula and Dynkin's formula, 



(B.12) E 71 



J * V(Z s )ds 



< E n a- 
= u(Z 



u(Z ) 



' Z,) e ^ v{Zs)ds 



W 



(Au(Z s ) + V(Z s )u(Z s ))e^ viz - )du 



ds 



and therefore J V(z)tt < H(tt) for any 7T6Q*. Hence, 



(B.13) 



h I \it<l f V(z) + C 1/2 (K) <\h + C 1/2 (K). 



Notice that 



(B.14) 



oo < TJOi) - 1 < 0i Xn-H < r n (0i) < oo. 



Hence, 



(B.15) 



r„(0i) - 1+ \h < e ± f Att - 1 -h < C 1/2 (K), 



which implies H < 2{Ci/ 2 {K) — T n (9i) + 1) and so also, 



(B.16) / Att < ± f V n+^C 1/2 (K) < ^(C 1/2 (K)-T n (9 1 ) + 1) + ^C 1/2 (K). 



Finally, notice that since h n — > h in both L l and L°° norms, we can find a function 
g such that sup„ h n < g and H^Hli < oo. and thus 



(B.17) 



r n (0i) > v n {-K) > r g (-K), 
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where T g denotes the case when the rate function is still A(-) but the exciting 
function is g(-) instead of h n (-). Notice that here \\g\lL 1 < oo but may not be less 
than 1. It is still well defined because of the assumption lini^oo -^ = 0. Indeed, 
we can find X(z) = v e + ez that dominates the original A(-) for v e > big enough 
and e > small enough so that cH^Hl 1 < 1- Now, we have T g (—K) > T v e € (—K) 
which is finite, where T v € e (—K) corresponds to the case when \{z) = v e + ez. Hence, 



(B.18) 



sup 



Att < C(K), 



for some C(K) > depending only on K. 



□ 



Lemma 34. Assume that A(-) > c for some c > 0, lini^oo -^ = and \(-) a is 
Lipschitz with constant L a for any a > 1. Then for any K > 0, T n {6) is Cauchy 
with 9 uniformly in [— K, K\. 

Proof. Let us write H n (t) = Ylr <t hn(t — Tj). Observe first, that for any q, 



(B19) - < « L log (™) dN - ~ I (&£* ~ mM) ) ds 



is a martingale under P n . By Holder's inequality, for any p, q > 1 with - + - = 1, 



(B.20) 

E Pm [e m ] 



E p " 
E Pn 



7 6N t d^rn 
dP n 



< E p " e 



e 0N t -f*(\(H m (s))-\(H n (s)))ds-f*lo g (^^±)dN s 
P 8N t -pft(\(H m (s))-\(H n (s)))ds] 1/P E Pn [ e 9/ *log(^W))dAr s ] 1/q 



By the Cauchy-Schwarz inequality, 



(B.21) E p " 



HHn(s)) . 



,1 So lo S 



dN s 



1/9 



< E 



Pn 



X^ffi^-x**™)* 



iq 



< E 



Pn 



ec 



^q=X L 2q So T. T <s \h m {s~r)-h n (s-T)\ds 



■±<l 



< W n e^ 



TTL2 q \\h rn -h n \\ L iN t 2 i 



We also have 



(B.22) E 1 



^0JVt-p/ o <(A(H m ( S ))-A(If n ( S )))<te 



Vp 



< E p " [. 



^JVe+pI-illhm-ftnlliiJVtl X /P 



Therefore, by Lemma 33 and the fact T n (0) = for any n, we have 



(B.23) 

r m (e)-r n (9) 



T n {9) 



< C{K)L 1 e m , n + 



l f + -r n ( P e) - -r n (9) + (i--) |r r 
j. p p \ pJ 



C(K) Li2q^rn,n t J 

g(g) L 2g e m , w | C{K){jp-l)K 
2q c 2 ^ 1 p 



+ 



K) 



C(K)K, 



where e m , n = \\h m - h n \\ L i. Hence, 



(B.24) 



limsup{r m (0) - T n (9)} < 2 ( 1 - - ) C(K)K, 



which is true for any p > 1. Letting p J, 1, we get the desired result. 



□ 



Remark 16. If A(-) > c > and lim 



A( Z ) 



Z— >OC ?a 



/or any a > 0, then, A(-) a «s 
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Lipschitz for any a > 1. For instance, X(z) = [log(z + c)Y satisfies the conditions 
if ft > and c> 1 . 

Theorem 32. Assume that A(-) > c /or some c > 0, lim^oo -^ = and A(-) Q zs 
Lipschitz with constant L a for any a > 1 . 



(B.25) 



lim -logE[e WVt ] = I\0) = lim T n (9), 

t— >oo t n— >oo 



for any 9 G 



Proof. By Lemma 34, T n (6) tends to T(6) uniformly on any compact set [-K, K]. 



Since T n (9) is Lipschitz by Lemma 33, it is continuous and the limit T is also 
continuous. Let e n = \\h n — h\i\ < e. As in the proof of Lemma 34, for any 



6 e [-K, K], p, q > 1 i + \ = 1, we get 



v q 



(B.26) limsup-logE[e 9Ar '] 

t— >OD t 

< T n (6) + C{K)L x e n 



C{K) L 2q e r 
2q ' 6 



2g-l 



2 1 



V 



C{K)K. 



Letting n — Y oo first and then p 1 1, we get limsup^^ | logEfe 61 ^] < r(#). 



Similarly, for any p',q' > 1 with 4 + ^ = 1, 



(B.27) r n (0) < liminf — \ogE[e {pe+pLien)Nt ] + liminf — logE 

t^oo pt t^oo 2qt 






< liminf — logE[e pp ' ejVt ] + liminf — logEfe 9 '^ 16 "^] 



+ lim inf — log E 

t^oo 2qt 



e^ 



%N t 



Since we can dominate A(-) by the linear function X(z) = v + z in which case the 
limit of logarithmic moment generating function T u {9) is continuous in 8, we may 
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let n — y oo to obtain 



(B.28) 



r(9) < liminf — logE[e pp ' 9Af < 

t^oo pp>t 



This holds for any 9 and thus 



(B.29) 



1 / f) 

liminf - logE[e e7Vi l > pp'T — 

t^oo t \pp' 



Letting p, p' j. 1 and using the continuity of T(-), we get the desired result. □ 



Finally, let us prove Theorem 18 



Proof of Theorem 18 For the upper bound, apply the Gartner-Ellis Theorem. Let 
us prove the lower bound. Let B e (x) denote the open ball centered at x with radius 
e > 0. By Holder's inequality, for any p, q > 1 with - + - = 1, 



(13.30) P n [^eB e (x)\ < 



dp n 



dP 



LP( 



P ( ^ G B e (x) 



1/9 



Therefore, letting t — > oo, we have 



(B.31) sup{6x-T n (9)} = lim ^logP n (^ G B e (x) 



< -Lripp'Lxen) + -^r ( 

pp Ipq \ 



L2pq'£n 

c ipq'-i 



+ - liminf - logP — G BJx 



q t-+oo t 



N t 



t 



where e n = \\h n — h\\ L i. Hence, letting n — >■ oo, see that 



(B.32) 



- liminf - logP I — £ BJx) ) > limsupsup{#a; — T n (9)}. 
q t^oo t \ t ' 
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Since T n (9) — > T(9) uniformly on any compact set K, 

(B.33) sup{9x - T n (9)} -» sup{9x - T(9)}, 

e<=K eeK 

as n — y oo for any such set K . Notice that A(-) > c > and recall that the limit 
for the logarithmic moment generating function with parameter 6 for a Poisson 
process with constant rate c is (e e — l)c. Hence 

(B.34) lim inf ^^ > lim inf ^ ~ 1)c = +00, 

which implies that supg^jftr — F n (0)} — >■ sup0 eR {6!a; — T(0)}. Therefore, 



1,. . .1. _/iV t 



(B.35) - liminf - logP — E B e (x) > sup{9x - T(9)}. 

q t^oo t \ t 



Letting q J, 1, we get the desired result. □ 

Lemma 35. // /i(t) > ; f Q h(t)dt < 00 ; h(oo) = 0, and h is continuous, then h 
can be approximated by a sum of exponentials both in L 1 and L°° norms. 

Proof. The Stone- Weierstrass theorem says that if X is a compact Hausdorff space 
and suppose A is a subspace of C(X) with the following properties, (i) If /, g E A, 
then / x g E A. (ii) 1 E A. (iii) If x, y E X then we can find an / E A 
such that /(#) 7^ f(y)- Then A is dense in C(X) in L°° norm. Consider X = 
M. + U {00} = [0,oo] and C[0, 00] consists of continuous functions vanishing at 00 
and the constant function 1. 

By Stone-Weierstrass theorem, the linear combination of 1, e -i , e~ 2t etc. is 
dense in C[0, 00]. In other words, for any continuous function h on C[0,oo], we 
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have 



(B.36) 



sup 

i>0 



h(t) 



j=0 



< e. 



In fact, since h(oo) = 0, we get \a \ < e. Thus 



(B.37) 



sup 

t>0 



hit) - J2 



a 3 e 



-jt 



i=i 



<2e. 



However, X^=i a j e ~'' t ma y n °t t» e positive. We can approximate y/h(t) first by 
a sum of exponentials and then approximate h(t) by the square of that sum of 
exponentials, which is again a sum of exponentials but positive this time. 

Indeed, we can approximate h(t) by the sum of exponentials in L 1 norm as well. 
Suppose \\h — h n \\ L oc — y 0, where h n is a sum of exponentials. Then, by dominated 
convergence theorem, for any 5 > 0, f \h — h n \e~ 5t dt — > as n — > oo. Thus, we 
can find a sequence 5 n > such that 5 n — > as n — >■ oo and J* \h — h n \e~ Snt dt — > 0. 
By dominated convergence theorem again, f h{l — e~ 5nt )dt — > 0. Hence, we have 
J* \h — h n e~ Snt \dt — ?• as n — > oo, where h n e~ 5nt is a sum of exponentials. 

We will show that h n e~ Snt converges to h in L°° as well. 



(B.38) 



\h h o~ £n*|| <T \\h h II _L IIA /i ^ — ^n*| 



Notice that (l-e~ 5nt )h n < (l-e~ 5nt )(h(t)+e). Since h(oo) = 0, there exists some 
M > 0, such that for £ > M, /i(t) < e so that (1 - e~ Kt )h n < 2e for t > M. For 
t < M, (1 - e~ Snt )h n < (1 - e- 5 ™ M )(||/i|| L oc + e) which is small if <5„ is small. D 
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